5.1.1 Acoustic Modeling: Overview
So far, we have been discovering how a
speech recognizer
compares spoken language to a
model of how the language should sound when spoken. This comparison
is necessary in order to decode a spoken utterance. Merriam-Webster
defines acoustics as "a science dealing with the production, control,
transmission, receipt and effects of sound". The models used
by a speech recognizer for decoding reflect this definition and are
therefore known as
acoustic models.
Acoustic models digitally model the features of a sound that are needed
by the recognizer for the decoding process. These numerical measurements
are obtained using a process called
feature extraction
explained in
Section 3.
The task of the recognizer is to determine what words are
spoken by comparing the acoustic measurements of the spoken language to
the measurements contained in the acoustic models. This tutorial
describes how to create and refine acoustic models through
processes called
initialization
and
training.
Due to the variability of
human speech, these processes are typically statistically based.
Continue to
Section 5.1.2
for an overview of the fundamental statistical techniques used
to build acoustic models.
|
|
|