Multimodal talker localization in video conferencing systems

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.

Creator: 

Lo, Charn Leung

Date: 

2005

Abstract: 

In a video conferencing environment, it is desirable to isolate the active talker. Traditionally, talker localization is performed acoustically using a beamforming microphone array or videographically using image processing techniques. Since these approaches rely only on the audio or the video data for performing the localization, they are often prone to errors. In this thesis, a new modular multimodal architecture is designed. Data from each localization modality are separated in the beginning, and localizations are performed using each data stream independently. In order to study the effectiveness of this modular multimodal architecture, this thesis combines audio, visual and infrared cues to locate talkers in the video conferencing environment. Special purpose acoustic, video and thermo localizers are developed to perform the localization. Individual results from the localizers are then combined using data fusion techniques to give the final estimation of the talker’s location. Two common fusion methods, the summing voter and the Bayesian network, are studied in this thesis. The effectiveness of another two novel fusion methods, the talker occupancy grid assisted summing voter and the talker occupancy grid assisted Bayesian network, are also investigated. A unique algorithm that uses the correlation lags to detect acoustic reflections is also developed in the process of this thesis. Based on the results from experiments and computer simulations, the proposed multimodal localization method outperforms localization methods, in terms of accuracy and robustness, when compared with other single modal methods that rely only on audio, video, or infrared data.

Subject: 

Videoconferencing.
Interactive multimedia.
Digital communications.

Language: 

English

Publisher: 

Carleton University

Thesis Degree Name: 

Doctor of Philosophy: 
Ph.D.

Thesis Degree Level: 

Doctoral

Thesis Degree Discipline: 

Engineering, Electrical

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).