Multimodal talker localization in video conferencing systems
Public Deposited- Resource Type
- Creator
- Abstract
In a video conferencing environment, it is desirable to isolate the active talker. Traditionally, talker localization is performed acoustically using a beamforming microphone array or videographically using image processing techniques. Since these approaches rely only on the audio or the video data for performing the localization, they are often prone to errors. In this thesis, a new modular multimodal architecture is designed. Data from each localization modality are separated in the beginning, and localizations are performed using each data stream independently. In order to study the effectiveness of this modular multimodal architecture, this thesis combines audio, visual and infrared cues to locate talkers in the video conferencing environment. Special purpose acoustic, video and thermo localizers are developed to perform the localization. Individual results from the localizers are then combined using data fusion techniques to give the final estimation of the talker’s location. Two common fusion methods, the summing voter and the Bayesian network, are studied in this thesis. The effectiveness of another two novel fusion methods, the talker occupancy grid assisted summing voter and the talker occupancy grid assisted Bayesian network, are also investigated. A unique algorithm that uses the correlation lags to detect acoustic reflections is also developed in the process of this thesis. Based on the results from experiments and computer simulations, the proposed multimodal localization method outperforms localization methods, in terms of accuracy and robustness, when compared with other single modal methods that rely only on audio, video, or infrared data.
- Subject
- Language
- Publisher
- Thesis Degree Level
- Thesis Degree Name
- Thesis Degree Discipline
- Identifier
- Rights Notes
Copyright © 2005 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
- Date Created
- 2005
Relations
- In Collection:
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
lo-multimodaltalkerlocalizationinvideoconferencing.pdf | 2023-05-03 | Public | Download |