The aim of Robust Speech Recognition is to reduce as much as possible the
environmental mismatch between the training and test conditions in order to optimally use the
acoustic models in the recognition process. There are several factors producing such mismatch:
inter-speaker variability, intra-speaker variability, and changes in the speaker environment or in
the channel characteristics. The changes in the environment represent a challenging area of work
and constitute one of the main driving forces of research in voice processing, that nowadays
faces application scenarios like mobile phones, moving cars, spontaneous speech, speech
masked by other speech, speech masked by music or non-stationary noises. The different
strategies that fight the effects of additive noise in the voice signal and the recognition process
will be summarized in this review, focusing in the normalization techniques and particularly in
the non linear transformations of the MFCC features. Histogram Equalization and Parametric
Histogram Equalization with their variants and evolutions will be analyzed as main
representatives of this family of non-linear feature transformations.
Keywords: robust speech recognition, feature normalization, histogram equalization, parametric equalization,
smoothing filters, temporal information.