MLLR: A SPEAKER ADAPTATION TECHNIQUE FOR LVCSR

Jon Hamaker
Institute for Signal and Information Processing
Mississippi State University, Mississippi State, MS 39762
email: hamaker@isip.msstate.edu

ABSTRACT

In typical state-of-the-art large vocabulary conversational speech recognition (LVCSR) systems a single model is developed using data from a large number of speakers to cover the variance across dialects, speaking styles, etc. With this, we expect that our systems will generalize well to any particular speaker. However, from experience we know that there are speakers who are poorly modeled using this paradigm. Thus, it would be advantageous to adapt the models, during run-time, to the new speaker. Following this premise, many methods have been developed which use a small amount of a speaker's data to adapt the speaker-independent model to a speaker-dependent one.

In this talk we will review the motivation and methodology behind these methods. Much of the time will be spent in describing one popular method which uses a maximum likelihood linear regression (MLLR) approach to speaker adaptation. MLLR builds a transform for the model parameters using linear regression so that the transformed parameters of each model better represent the new speaker. Applying this approach to all of the models in an LVCSR system (particularly when using mixture models) would require an unreasonable number of additional parameters and a large amount of training data for full coverage. To attack this problem a small number of transforms are built and tying is used. MLLR has become a standard feature in most LVCSR systems and has proven successful in every major speaker-independent speech recognition task to which it has been applied.

Additional items of interest: