Speaker recognition using PCA-based feature transformation

Ahmed Isam Ahmed, John P. Chiverton, David L. Ndzi, Victor M. Becerra

Research output: Contribution to journalArticle

Abstract

This paper introduces a Weighted-Correlation Principal Component Analysis (WCR-PCA) for efficient transformation of speech features in speaker recognition. A Recurrent Neural Network (RNN) technique is also introduced to perform the weighted PCA. The weights are taken as the log-likelihood values from a fitted Single Gaussian-Background Model (SG-BM). For speech features, we show that there are large differences between feature variances which makes covariance based PCA less optimal. A comparative study of the performance of speaker recognition is presented using weighted and unweighted correlation and covariance based PCA. Extensions to improve the extraction of MFCC and LPCC features of speech are also proposed. These are Odd Even filter banks MFCC (OE-MFCC) and Multitaper-Fitted LPCC. The methodologies are evaluated for the i-vector speaker recognition system. A subset of the 2010 NIST speaker recognition evaluation set is used in the performance testing in addition to evaluations on the VoxCeleb1 dataset. A relative improvement of 44% in terms of EER is found in the system performance using the NIST data and 18% using the VoxCeleb1 dataset.
Original languageEnglish
Pages (from-to)33-46
Number of pages14
JournalSpeech Communication
Volume110
DOIs
Publication statusPublished - 2 Apr 2019

Fingerprint

Speaker Recognition
performance
Recurrent neural networks
Filter banks
evaluation
neural network
Principal component analysis
Filter Banks
Correlation Analysis
bank
Evaluation
Recurrent Neural Networks
Principal Component Analysis
Comparative Study
System Performance
Likelihood
Odd
methodology
Testing
Subset

Keywords

  • weighted principal component analysis
  • feature fusion
  • i-vector system

Cite this

Ahmed, Ahmed Isam ; Chiverton, John P. ; Ndzi, David L. ; Becerra, Victor M. / Speaker recognition using PCA-based feature transformation. In: Speech Communication. 2019 ; Vol. 110. pp. 33-46.
@article{4066cc54d2e14dc4843745ae3cac5103,
title = "Speaker recognition using PCA-based feature transformation",
abstract = "This paper introduces a Weighted-Correlation Principal Component Analysis (WCR-PCA) for efficient transformation of speech features in speaker recognition. A Recurrent Neural Network (RNN) technique is also introduced to perform the weighted PCA. The weights are taken as the log-likelihood values from a fitted Single Gaussian-Background Model (SG-BM). For speech features, we show that there are large differences between feature variances which makes covariance based PCA less optimal. A comparative study of the performance of speaker recognition is presented using weighted and unweighted correlation and covariance based PCA. Extensions to improve the extraction of MFCC and LPCC features of speech are also proposed. These are Odd Even filter banks MFCC (OE-MFCC) and Multitaper-Fitted LPCC. The methodologies are evaluated for the i-vector speaker recognition system. A subset of the 2010 NIST speaker recognition evaluation set is used in the performance testing in addition to evaluations on the VoxCeleb1 dataset. A relative improvement of 44{\%} in terms of EER is found in the system performance using the NIST data and 18{\%} using the VoxCeleb1 dataset.",
keywords = "weighted principal component analysis, feature fusion, i-vector system",
author = "Ahmed, {Ahmed Isam} and Chiverton, {John P.} and Ndzi, {David L.} and Becerra, {Victor M.}",
year = "2019",
month = "4",
day = "2",
doi = "10.1016/j.specom.2019.04.001",
language = "English",
volume = "110",
pages = "33--46",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier B.V.",

}

Speaker recognition using PCA-based feature transformation. / Ahmed, Ahmed Isam ; Chiverton, John P.; Ndzi, David L.; Becerra, Victor M.

In: Speech Communication, Vol. 110, 02.04.2019, p. 33-46.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Speaker recognition using PCA-based feature transformation

AU - Ahmed, Ahmed Isam

AU - Chiverton, John P.

AU - Ndzi, David L.

AU - Becerra, Victor M.

PY - 2019/4/2

Y1 - 2019/4/2

N2 - This paper introduces a Weighted-Correlation Principal Component Analysis (WCR-PCA) for efficient transformation of speech features in speaker recognition. A Recurrent Neural Network (RNN) technique is also introduced to perform the weighted PCA. The weights are taken as the log-likelihood values from a fitted Single Gaussian-Background Model (SG-BM). For speech features, we show that there are large differences between feature variances which makes covariance based PCA less optimal. A comparative study of the performance of speaker recognition is presented using weighted and unweighted correlation and covariance based PCA. Extensions to improve the extraction of MFCC and LPCC features of speech are also proposed. These are Odd Even filter banks MFCC (OE-MFCC) and Multitaper-Fitted LPCC. The methodologies are evaluated for the i-vector speaker recognition system. A subset of the 2010 NIST speaker recognition evaluation set is used in the performance testing in addition to evaluations on the VoxCeleb1 dataset. A relative improvement of 44% in terms of EER is found in the system performance using the NIST data and 18% using the VoxCeleb1 dataset.

AB - This paper introduces a Weighted-Correlation Principal Component Analysis (WCR-PCA) for efficient transformation of speech features in speaker recognition. A Recurrent Neural Network (RNN) technique is also introduced to perform the weighted PCA. The weights are taken as the log-likelihood values from a fitted Single Gaussian-Background Model (SG-BM). For speech features, we show that there are large differences between feature variances which makes covariance based PCA less optimal. A comparative study of the performance of speaker recognition is presented using weighted and unweighted correlation and covariance based PCA. Extensions to improve the extraction of MFCC and LPCC features of speech are also proposed. These are Odd Even filter banks MFCC (OE-MFCC) and Multitaper-Fitted LPCC. The methodologies are evaluated for the i-vector speaker recognition system. A subset of the 2010 NIST speaker recognition evaluation set is used in the performance testing in addition to evaluations on the VoxCeleb1 dataset. A relative improvement of 44% in terms of EER is found in the system performance using the NIST data and 18% using the VoxCeleb1 dataset.

KW - weighted principal component analysis

KW - feature fusion

KW - i-vector system

U2 - 10.1016/j.specom.2019.04.001

DO - 10.1016/j.specom.2019.04.001

M3 - Article

VL - 110

SP - 33

EP - 46

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -