/ Acoustic / Fundamentals / Production / Tutorials / Software / Home

5.1.1 Acoustic Modeling: Overview

So far, we have been discovering how a speech recognizer compares spoken language to a model of how the language should sound when spoken. This comparison is necessary in order to decode a spoken utterance. Merriam-Webster defines acoustics as "a science dealing with the production, control, transmission, receipt and effects of sound". The models used by a speech recognizer for decoding reflect this definition and are therefore known as acoustic models. Acoustic models digitally model the features of a sound that are needed by the recognizer for the decoding process. These numerical measurements are obtained using a process called feature extraction explained in Section 3.

The task of the recognizer is to determine what words are spoken by comparing the acoustic measurements of the spoken language to the measurements contained in the acoustic models. This tutorial describes how to create and refine acoustic models through processes called initialization and training. Due to the variability of human speech, these processes are typically statistically based. Continue to Section 5.1.2 for an overview of the fundamental statistical techniques used to build acoustic models.

Glossary / Help / Support / Site Map / Contact Us / ISIP Home