Single-ended non-intrusive speech quality monitoring in VoIP

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Ding, Lijing




Voice over Internet Protocol (VoIP) is a promising technology and it is expected to replace the traditional telephone networks in the next few years. However, speech quality in VoIP is not guaranteed, due to various new impairments introduced by Internet and Internet Protocol (IP) terminals. Evaluating VoIP speech quality in a non-intrusive fashion is challenging, as a reference signal is not accessible.

The thesis develops a single-ended, non-intrusive speech quality classification and assessment algorithm in VoIP, which reflects end-users’ true quality of experience. A novel assessment structure is proposed, which utilizes a three-step strategy: impairment detection, individual effect modeling and an overall assessment model. The algorithm combines the merits of voice payload and IP header analysis approaches, and several major impairments in VoIP, including temporal clipping, echo, packet loss and noise, are investigated in the thesis.

To model the effects of temporal clipping on speech quality, an algorithm based on the clipping statistics is developed, with different weighting factors assigned to different clipping locations. For the effects of packet loss, a scheme is proposed to first classify the lost packet into three types: silence, unvoiced and voiced, then an algorithm is developed by using loss localization information. To reflect the effects of loss burstiness, a new codec-dependent parameter is introduced. Two prevailing VoIP codecs, ITU-T Rec. G.711 and G.729A, are examined. Echo detection is achieved by measuring its echo path delay and echo path loss. Two algorithms suitable for VoIP scenarios where echo delay is excessive and echo path is nonlinear are developed. For the overall model, the individual models for temporal clipping and packet loss are first combined, with the noise and echo perception models in the E-model, an overall assessment model is developed. Particularly, a two-step noise power estimation method is adopted. The noise additivity assumption in the E-model is examined and a correction curve is suggested.

In the thesis, ITU-T Ree. P.862.1 is used to objectively measure the speech quality. The simulation results show the accuracy and effectiveness of the proposed algorithm. The correlation between prediction and measurement is 0.90, and standard error is 0.27 Mean Opinion Score (MOS). A subjective MOS test covering some key scenarios is also conducted; the ratings are analyzed to verify the novel concepts and to calibrate the proposed models. Moreover, the performance limitations of several leading objective measures are pointed out.


Speech processing systems.
Internet telephony.
Computer network protocols.




Carleton University

Thesis Degree Name: 

Doctor of Philosophy: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Engineering, Electrical

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).