. As an example, the waveform of the pronounced word ‘brother’ gathered by the REAL system (detecting from the speaker’s mask) is shown in Figure 1a. A microphone waveform is also shown for comparison. Because of its construction simplicity, the cost of the REAL system is much lower than a conventional LDV system and the miniaturization is readily available. (see Supporting Information for comparison of REAL and LDV)
As a robotic ear, the REAL system needs to continuously capture a specific voice source. In Figure 1b, the system is mounted on a motorized gimbal, and a camera is used to detect and track the throat or the mask of the speaking person. The detected target position is fed into the control loop of the gimbal, which points the laser to the target continuously as the target is moving. A microphone is attached to REAL system to collect the audio signal for comparison, as well as further augment the REAL signal by fusing the two independent modalities. Figure 1c shows a cocktail party scenario where the gimbaled REAL system operates to ‘hear’ a specific person remotely without acoustic channel interference.