Monaural Music Separation via Supervised Non-Negative Matrix Factor with Side-Information

Public Deposited
Resource Type
Creator
Abstract
  • In this dissertation, a supervised source template nonnegative matrix factorization (NMF) algorithm is proposed to solve the monaural music source separation problem. Different from the previous state-of-the-art algorithms, the basic theoretical concept of the proposed algorithm considers the spectrogram from an audio mixture as linear combinations of note templates. Having prior knowledge of these note templates for each source, we can estimate and determine the activities of each template in recordings to build a mask of each source. Through the masks, the audio of target tracks can be reconstructed.We reviewed previous research on source separation for monaural music audio separation and compared these work with our proposed algorithm not only in mathematical expressions but also in separation performances. First, the prior knowledge of note templates is informed by musical instrument audio dataset. The spectrograms from these instruments are obtained and factored into a source resonance character matrix and a source impulse excitation matrix by assuming that the spectrum of the different notes are formed by the resonance effects from an impulse excitation. Secondary, according to the prior informed note templates, their onset-offset-like features are estimated by using the multiplicative update rule and supervised by the proposed pitch-checking algorithm to remove misleading estimations. Finally, the supervised note onset-offset-like features alternatively become a constraint to help the proposed model evolve its prior informed note templates into the forms given by the recorded instruments.We employed the TRIOS and the Bach-10 dataset for our multi-source separation performance tests. Among the source separation algorithms, our proposed supervised source template NMF and the state-of-the-art algorithms including the sound-prism and the Oracle-toolbox methods were selected to make comparisons. Furthermore, we added white Gaussian noise into the audio mixture to simulate the background full of the random noise to test the noise characteristics of each algorithm. The experimental results SDR (signal to distortion ratio), SIR (signal to interferences ratio), and SAR (signal to artifacts ratio) indicate that with the note templates from side-information, the proposed supervised source template NMF algorithm can have equivalent or higher performance in two-source separation and have a better performance under noise.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2017 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2017

Relations

In Collection:

Items