Speech data collected over the telephone, such as the SWITCHBOARD
conversational speech data corpus, contains
echo caused by the process of converting a two-wire signal used
in the local loop to a four-wire signal used in network transmission.
See
system overview
for more details.
This echo could be used by the speech recognizer to gather important
cues regarding the ID of the speaker or the channel conditions,
thereby making the job that much easier. To eliminate this problem, we
need an efficient echo cancellation technology.
We have developed an FIR echo-canceller for this purpose.
A detailed description of this technology can be found in the following
reference:
- Messerschmitt, David; Hedberg, David; Cole, Christopher;
Haoui, Amine; Winship, Peter; "Digital Voice Echo Canceller
with a TMS32020," in Digital Signal Processing Applications
with the TMS320 Family, pp. 415-437, Texas Instruments, Inc., 1986.
This document can be retrieved from the
Texas Instruments
web site. A
copy
can be found on this web site as well.
We have deviated from the standard implementation of an LMS echo-canceller at
places to accommodate certain problems we face. Some of the main
problems we encountered during the development of the system are:
- Double talk:
This is a condition when both the speakers talk simultaneously.
If we adapt the FIR filter coefficients during double talk, the
filter will diverge, causing "blips" in the output. This can be
avoided by having an efficient voice activity detector (VAD).
When the VAD detects near-end speech the adaptation process is
suspended. This avoids the divergence problem.
- Complex echo:
The echo-canceller performs poorly in some cases of double
talk. It fails to cancel the far-end speech effectively. We
attribute this to the possibility of the existence of complex
echo patterns.
- Residual Error Suppression:
We know that due to the non-linearities of the echo path of the
telephone network the maximum suppression possible is limited
to about 40dB. So, in cases when the return signal power falls
below a threshold based on the reference signal power, it is
suggested that we zero the output. This process however creates
a choppyness in the background. To make the background more
uniform, we decided to make the output equal to a scaled
version of the reference signal when the near-end signal is not
present.
- Length of the filter:
Unfortunately the length of the FIR filter has to depend
on the maximum delay in echo signal in the data set we are
using. If we consider international telephone
conversations, the round trip delay is typically an order
of magnitude more than that for domestic calls. We would
like our system to automatically choose the length of the
filter depending on the maximum round-trip delay the user
specifies. Also another unanswered question is the
relationship between the adaptation rate constant and the
filter length. From the experiments we performed, there
seems to an inverse relationship between the two
quantities.
This program is easy to use. After you download, compile and link
the code, simply type:
ec.exe < input_file > output_file
The input signal must be 16-bit interleaved stereo data.
The output signal will be the same.
You can download the following from our site:
- Tar File:
download a C++ implementation in compressed gzip format.
- Source Code:
view the C++ source code distribution.
- Example Data:
some example data to verify your implementation.
- System Overview:
a system overview in pdf format.
- TI DSP Application Note:
an excellent application note describing the theory and implementation
of an LMS echo canceller. The implementation included here is based
on this application note and references it heavily.
|