TI-Digits Short: Language Model Preparation

TI Digits Short: Language Model Preparation

Typically, researchers use N-gram based language models which describe the likelihood of a word being spoken based on the prior N words. For example, if the speech recognizer creates two equally possible transcriptions for an utterance, let's say "fan" and "can", we can look at the prior word to help us make the best decision. If the previous word was "garbage," then we can assume that "can" is a better choice, i.e. "garbage can" is a much more likely phrase than "garbage fan."

N-grams are a very useful tool when it comes to natural language processing (NLP). However, this particular set of data consists of numbers and nothing else, so the likelihood of the number 1 being followed by the number 3 is the same as 1 being follwed by 8. Thus there isn't really a need for a language model for this experiment. However, HTK does not allow us to omit a language model when decoding so we'll simply create a wordnet to take its place.

Procedure

Generating the word net from the task grammer: We have a simple grammar under data_preparation/grammar/gramm.txt , which we have to convert to a word net, wdnet, and store it in the directory isip/exp/htk_tutorial/data_preparation/grammar:
- Go to the directory isip/exp/htk_tutorial/data_preparation/grammar
- From the command line type: HParse gramm.txt wdnet

Data Preparation

Language Model Preparation

Dictionary Preparation & Phone Lists

Feature Extraction

Transcription Preparation

Training

Monophones: Flat Start

Monophones: Adding sp

Monophones: Realignment

Generating Triphone Lists and Initial Training

Word Internal Triphones: State Tying

Decoding

Creating Test Transcriptions and Final Results