/ Acoustic / Fundamentals / Production / Tutorials / Software / Home

Overview:

A speech recognizer must compare language spoken into a microphone or telephone to models of how that language should sound when spoken. These models are called acoustic models because they represent numerically how language sounds. Thus, the recognizer decodes words and phrases by comparing the measurements of how they sound when spoken to the measurements in the acoustic models. This section explains how to create different types of acoustic models, including word and phone models. It describes how to refine these models through initialization, training, mixture splitting, and state-tying, using our software.

Contents:

5.1 Overview
5.1.1 Acoustic Modeling
5.1.2 Statistical Methods
5.1.3 Hidden Markov Models
5.1.4 Acoustic Model Types

5.2 Word Models
5.2.1 Initialization
5.2.2 Reestimation
5.2.3 Single-Path Silence Training
5.2.4 Multi-Path Training
5.2.5 Mixture Splitting

5.3 CI Phone Models
5.3.1 Initialization
5.3.2 Single-Path Silence Training
5.3.3 Multi-Path Silence Training
5.3.4 Mixture Splitting

5.4 Word Internal CD Models
5.4.1 Triphone Generation
5.4.2 Triphone Training
5.4.3 State Tying
5.4.4 Training State Tied Triphones
5.4.5 Mixture Splitting

5.5 Cross-Word CD Models
5.5.1 Triphone Generation
5.5.2 Triphone Training
5.5.3 State Tying
5.5.4 Training State Tied Triphones
5.5.5 Mixture Splitting

5.6 Parallel Training
5.6.1 Overview
5.6.2 Word Models

5.7 Command Synopsis

Glossary / Help / Support / Site Map / Contact Us / ISIP Home