Overview: A speech recognizer must compare language spoken into a microphone or telephone to models of how that language should sound when spoken. These models are called acoustic models because they represent numerically how language sounds. Thus, the recognizer decodes words and phrases by comparing the measurements of how they sound when spoken to the measurements in the acoustic models. This section explains how to create different types of acoustic models, including word and phone models. It describes how to refine these models through initialization, training, mixture splitting, and state-tying, using our software. Contents:
|