Automatic Speaker Recognition: Recent Progress, Current Applications, and Future Trends

BravoBrava

Mississippi State University

Input

Speech

Recognition Architectures
Incorporating Multiple Knowledge Sources

Acoustic

Front-end

• The signal is converted to a sequence of feature vectors based on spectral and temporal measurements.

Acoustic Models

P(A/W)

• Acoustic models represent sub-word

units, such as phonemes, as a finite- state machine in which states model spectral structure and transitions

model temporal structure.

Recognized

Utterance

Search

• Search is crucial to the system, since

many combinations of words must be

investigated to find the most probable

word sequence.

• The language model predicts the next

set of words, and controls which models are hypothesized.

Language Model

P(W)