/ Recognition / Fundamentals / Production / Tutorials / Software / Home
Overview:
Section 4: Recognition

An overview of the speech recognition process is shown below. There are three main components to the process: acoustic modeling, language modeling, and search. Search is often referred to as recognition, decoding or evaluation. It is the process by which the system uses a fully-trained recognizer to produce a hypothesis of what was spoken. It is the main topic of this section. Acoustic modeling is described in Section 5 and language modeling is described in Section 6.

Conversion of the speech signal to a text message containing the spoken words is only one of many tasks entailed in the process of automatic speech recognition. Once the acoustic and language models are built, recognition requires searching all possibilities generated by these models. The number of possibilities generated can be prohibitive. Thus, efficient search techniques are critical to the performance of a recognizer. Most recognition systems use the Viterbi beam search algorithm, but other algorithms may be used and are supported in the software. Continue to Section 4.1 for additional theoretical information on search algorithms for speech recognition.

Contents:
Section 4: Recognition


4.1   Overview
  4.1.1   Bayes' Rule
  4.1.2   Search Techniques
  4.1.3   Modes

4.2   Network Decoding
  4.2.1   Command Line Options and Arguments
  4.2.2   Word Models
  4.2.3   Context-Independent Phones
  4.2.4   Context-Dependent Phones
  4.2.5   Cross-Word Context-Dependent Phones
  4.2.6   The Parameter File
  4.2.7   The Configuration File
  4.2.8   Language Model File
  4.2.9   Statistical Model Pool

4.3   Scoring
  4.3.1   Error Analysis
  4.3.2   Scoring Reports
  4.3.3   Generation
  4.3.4   Conversion
  4.3.5   Evaluation
  4.3.6   Signficance Testing

4.4   Forced Alignment
  4.4.1   Overview
  4.4.2   Example

4.5   N-Best Generation
  4.5.1   Generating Multiple Hypotheses

4.6   Word Graph Generation
  4.6.1   Generating a Word Graph

4.7   Word Graph Rescoring
  4.7.1   Overview
  4.7.2   Language Model Rescoring
  4.7.3   Example
  4.7.4   Acoustic Model Rescoring
  4.7.5   Example

4.8   Word Graph Error Rate
  4.8.1   Overview
  4.8.2   Example

4.9   Command Synopsis
Section 4: Recognition
   
Table of Contents   Section Contents   Previous Page Up Next Page
      Glossary / Help / Support / Site Map / Contact Us / ISIP Home