In this dissertation, a supervised source template nonnegative matrix factorization (NMF) algorithm is proposed to solve the monaural music source separation problem. Different from the previous state-of-the-art algorithms, the basic theoretical concept of the proposed algorithm considers the spectrogram from an audio mixture as linear combinations of note templates. Having prior knowledge of these note templates for each source, we can estimate and determine the activities of each template in recordings to build a mask of each source. Through the masks, the audio of target tracks can be reconstructed.
We reviewed previous research on source separation for monaural music audio separation and compared these work with our proposed algorithm not only in mathematical expressions but also in separation performances. First, the prior knowledge of note templates is informed by musical instrument audio dataset. The spectrograms from these instruments are obtained and factored into a source resonance character matrix and a source impulse excitation matrix by assuming that the spectrum of the different notes are formed by the resonance effects from an impulse excitation. Secondary, according to the prior informed note templates, their onset-offset-like features are estimated by using the multiplicative update rule and supervised by the proposed pitch-checking algorithm to remove misleading estimations. Finally, the supervised note onset-offset-like features alternatively become a constraint to help the proposed model evolve its prior informed note templates into the forms given by the recorded instruments.
We employed the TRIOS and the Bach-10 dataset for our multi-source separation performance tests. Among the source separation algorithms, our proposed supervised source template NMF and the state-of-the-art algorithms including the sound-prism and the Oracle-toolbox methods were selected to make comparisons. Furthermore, we added white Gaussian noise into the audio mixture to simulate the background full of the random noise to test the noise characteristics of each algorithm. The experimental results SDR (signal to distortion ratio), SIR (signal to interferences ratio), and SAR (signal to artifacts ratio) indicate that with the note templates from side-information, the proposed supervised source template NMF algorithm can have equivalent or higher performance in two-source separation and have a better performance under noise.