Figure 1a illustrates the working principle of the REAL system. A low-power collimated laser beam is targeted at the remote vibrating surface. A telescoping lens system is aligned with the remote spot to collect the back-scattered photons and focus them onto the avalanche photodetector (APD), from which the AC signal is amplified and processed as the REAL audio (see Methods for REAL signal processing). This signal is the result of back-scattered light intensity change as the surface vibrates. If we assume the scattering surface is Lambertian,[22] the collected back-scattered laser power will decay with squared distance \({P_c} \propto {P_0}/{r^2}\), where \(P_0\) is the scattering laser power exiting normal to the surface, and \(r\) is the distance between the surface spot and the collecting lens. As the surface vibrates, the relative distance changes (\(r = {r_0} + \Delta r\)) and this \(\Delta r\) results in a change in collected laser power \(\Delta {P_c} \sim {P_0}(1/{r^2} - 1/{(r + \Delta r)^2})\) . As an example, the waveform of the pronounced word ‘brother’ gathered by the REAL system (detecting from the speaker’s mask) is shown in Figure 1a. A microphone waveform is also shown for comparison. Because of its construction simplicity, the cost of the REAL system is much lower than a conventional LDV system and the miniaturization is readily available. (see Supporting Information for comparison of REAL and LDV)