Channel and channel subband selection for speaker diarization

Ahmed Isam Ahmed*, John P. Chiverton, David L. Ndzi, Mahmoud M. Al-Faris

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)
    41 Downloads (Pure)

    Abstract

    Speaker diarization can be considered to be one of the complex problems in speaker recognition. A reliable diarization system should be able to accurately determine the variable length utterances which a speaker contributes to multi-speaker conversations. This is a difficult problem since text-independent speaker identification and verification is yet to be improved for it to be applied reliably. While efficient speaker modelling is important for diarization, the acoustical representation of speech is the basic entity that signifies a speaker. This representation should be outstanding enough to prevent a speaker’s utterances from being lost in the acoustical congestion that is imposed by the rest of the talkers.

    For this purpose, it is proposed here, for the case of multiple-microphone diarization, multiple speech signals are used in the acoustic feature extraction instead of combining the signals beforehand. The reason is to make an optimal use of those signals in order to enrich the quality of the acoustical representation of the speaker. To this end, and since not all microphone signals (channels) may be desirable, two selection approaches are proposed in this work. These are, a best quality channel selection method and a novel approach for diverse channel selection. Furthermore, a novel method is proposed which retains the speech spectrum from selected least reverberated subbands of the available channels’ spectrums. A new model, referred to here as Averaged Joint Gradient (AJG), is introduced for this purpose. The proposed approach reduces the Diarization Error Rate (DER) in both of the diarization systems used in the evaluations. The first system is based on binary keys and achieves a maximum relative reduction in DER of 14%. The second one is a Gaussian Mixture Model-Bayesian Information Criterion (GMM-BIC) based system which achieves a maximum relative reduction in DER of 20%.
    Original languageEnglish
    Article number101367
    Number of pages20
    JournalComputer Speech and Language
    Volume75
    Early online date24 Feb 2022
    DOIs
    Publication statusPublished - 30 Sept 2022

    Keywords

    • speaker diarization
    • channel selection
    • reverberation
    • acoustic beamforming

    Fingerprint

    Dive into the research topics of 'Channel and channel subband selection for speaker diarization'. Together they form a unique fingerprint.

    Cite this