Abstract:
Speech processing applications such as speech enhancement and speaker identification
rely on the estimation of relevant parameters from the speech signal. These
parameters must often be estimated from noisy observations since speech signals are
rarely obtained in ‘clean’ acoustic environments in the real world. As a result, the
parameter estimation algorithms we employ must be robust to environmental factors
such as additive noise and reverberation. In this work we derive and evaluate approximate
Bayesian algorithms for the following speech processing tasks: 1) speech
enhancement 2) speaker identification 3) speaker verification and 4) voice activity
detection.
Building on previous work in the field of statistical model based speech enhancement,
we derive speech enhancement algorithms that rely on speaker dependent priors
over linear prediction parameters. These speaker dependent priors allow us to handle
speech enhancement and speaker identification in a joint framework. Furthermore,
we show how these priors allow voice activity detection to be performed in a robust
manner.
We also develop algorithms in the log spectral domain with applications in robust
speaker verification. The use of speaker dependent priors in the log spectral domain
is shown to improve equal error rates in noisy environments and to compensate for
mismatch between training and testing conditions.