In this thesis, a combined blind source separation (BSS) and speaker recognition approach for teleconferences is studied. By using a microphone array, consisting of eight microphones, different methods to perform overdetermined independent vector analysis (IVA) are compared. One method is to select a subset of microphones or all available microphones to perform IVA. The second method, the so called subspace method, that utilizes a principal component analysis (PCA) for dimensionality reduction, is applied prior to IVA.
For the evaluation of IVA, the BSS Eval toolbox is used to calculate the source to distortion ratio (SDR), the source to interferences ratio (SIR) and the source to artifacts ratio (SAR), that indicate the quality of the separation.
The speaker recognition system is based on Gaussian mixture models (GMMs), that are trained on the mel frequency cepstral coefficients (MFCCs) of each speaker. The
performance of the speaker recognition is measured by the diarization error rate (DER).
The evaluation results of the speaker recognition show, that a combined BSS and speaker recognition can increase the performance of the speaker recognition system. For
the case of two simultaneously active speakers, the rate of detecting both speakers correctly could be improved from 0% without separation to 66% with separation in an anechoic room. For an echoic office room 57% could be achieved.
«
In this thesis, a combined blind source separation (BSS) and speaker recognition approach for teleconferences is studied. By using a microphone array, consisting of eight microphones, different methods to perform overdetermined independent vector analysis (IVA) are compared. One method is to select a subset of microphones or all available microphones to perform IVA. The second method, the so called subspace method, that utilizes a principal component analysis (PCA) for dimensionality reduction,...
»