This thesis deals with the development of adaptive filter structures, adaptation algorithms, and control algorithms capable of operating in harsh environments, and applied to the problem of echo cancellation in telecommunications systems. The specific environments considered are moderate-to-high levels of background noise, doubletalk conditions, and echo cancellation in Voice-over-IP (VoIP) networks. A secondary focus is maintaining low complexity in the resulting structures and algorithms.
The problem of doubletalk detector calibration is addressed by statistical modeling and applied to the normalized cross-correlation-based doubletalk detector. Detection probability is a function of the parameter estimation window, echo to noise ratio (SNR) and near-end speech to echo ratio (NER), and methods for compensating background noise bias do not eliminate the detection statistic’s SNR dependency. Signal-adaptive algorithms are presented for constructing thresholds based on statistical criteria, which are shown to be capable of increasing detection rates.
Psychoacoustic limits of echo canceller performance in the presence of noise are quantified using a perceptual model of hearing, and it is shown that average-power-based performance measures may under- or over-estimate the amount of audible echo removed by an echo canceller. Two algorithms are proposed for estimating the audible echo signal reduction provided by an echo canceller, and verified using informal listening tests.
Affine Projection (AP) and normalized cross-correlation-based doubletalk detection algorithms are derived for echo cancellers employing critically sampled subband adaptive filters. Subband AP, even with only 2 - 4 subbands, can improve the rate of convergence over fullband AP employing the same projection order. Background noise is not spectrally flat, and so per-subband adaptive detection thresholds can be constructed which provide an improvement in detection rates over fullband doubletalk detectors.
Adaptation and control algorithms utilizing linear-prediction-based speech parameters are proposed for echo cancellers deployed in VoIP networks. Decorrelated adaptation and doubletalk detection algorithms are presented that avoid the cost of constructing decorrelation filter coefficients. A power spectrum estimation algorithm is proposed for residual echo from nonlinear vocoder distortion. When incorporated into a frequencydomain post-filter, near-end speech spectral distortion is improved by 0.98 dB, with 0.4 increase in estimated mean opinion score.