Web Based Video Conferencing for Larger Rooms (The Big Room Quandary): Acoustic Echo Canceling

It is always a fun experience to stand on a large hillside or canyon and yell a single word at the top of your voice to hear it come back to you a second later. We call this an echo, but what is really happening? Sound travels, or propagates, through air at about 340 meters per second (1115 feet per second). If you are yelling toward an object that is about 500 feet away and the surface is hard enough to reflect the sound back to you, you will hear it about a second after you initially yelled. Understanding this concept helps us to understand why you might hear an echo of yourself talking on a conference call. If the sound coming from the speaker in a room, then goes back into a microphone, you will hear an echo of yourself. The delay time of the echo will be dependent on the distance the microphone is from the speaker and latency (time it takes to encode audio and video, send it to the far end, and then decode). If the delay of the echo you hear is long enough (anything over 30ms) or loud enough, this can be extremely distracting for the person talking and the other participants. This makes it difficult to talk and hold a conversation.

There is technology that has been around for many years that mitigates echo in conferencing. This technology is known as Acoustic Echo Canceling (AEC). The AEC that is working on your end of the conference is for the benefit of the people on the far end. If you hear yourself echo, it is someone else’s system that is responsible for you hearing the echo.

There is a lot going on under the hood but here is a basic description of how AEC works. The audio from your microphone passes through anAEC signal processor before it is sent out to the far end for other participants of your conference. The audio from the far end, that comes out your speakers, is a reference signal for the AEC processor. It uses this signal to compare it with the signal that is coming from your microphone and cancels anything that has the same general wave form thus canceling the echo going back to the far end presenter.

Most soft codecs that you would have on your laptop or phone, have a degree of this built in but it does not have to work very hard because the speaker on your laptop or phone is a few inches away from the microphone, it is a single mic, and the output of your local speaker is not very loud. In larger room systems, there could be several feet between the speakers and the microphone. The sound reinforcement system in larger rooms raises the audio level from the speakers going back into the microphones. To complicate matters even more, there could be several microphones located at different distances from the speakers. This makes it very difficult for a simple AEC processor in a soft codec to handle this. In this case, the AEC should be handled on individual microphone channels which could only really take place in a separate digital signal processor (DSP).

Depending on the DSP and capability of the AEC used, there is a breaking point to where the processing cannot compensate for bad microphone placement or what is happening in the room acoustically. Other issues like doubling up on the AEC processing and improper configuration can also lead to still having bad audio. Due to service provider certification requirements, not all hardware will disable the AEC within the soft codec, which adds to the problem. Proper microphone selection and positioning along with proper hardware selection and setup is critical to having good audio in conferencing.

NEXT INSTALLMENT: Advanced Display/Projector Presentation Requirements