BravoBrava
Mississippi State University
Input
Speech
Recognition Architectures
Incorporating Multiple Knowledge Sources
Acoustic
Front-end
•
The signal is converted to a sequence of
feature vectors based on spectral and
temporal measurements.
Acoustic Models
P(A/W)
•
Acoustic models represent sub-word
units, such as phonemes, as a finite-
state machine in which states model
spectral structure and transitions
model temporal structure.
Recognized
Utterance
Search
•
Search is crucial to the system, since
many combinations of words must be
investigated to find the most probable
word sequence.
•
The language model predicts the next
set of words, and controls which models
are hypothesized.
Language Model
P(W)