TI-Digits Short: Creating Test Transcriptions and Final Results

TI Digits Short: Creating Test Transcriptions and Final Results

We have finally completed all of the acoustic model training and are ready to evaluate our models. To do this, we create a set of transcriptions for our test data using our acoustic models. We then compare these transcriptions to the true transcriptions to calculate a word error rate (WER) for the system. HTK's function HVite is used to generate the test data's transcriptions and HResults outputs our evaluation in a clear and concise format.

Monophone Decoding Procedure

We first use HVite to generate transcriptions for our testing data. We pass our acoustic models (hmmdefs and macros), the list of testing data (test_list.list), and the language model (wdnet) as inputs. We also must include a value for the word insertion penalty and a weight for the language model as indicated by the values after "-p" and "-s" respectively. These parameters can be tuned to improve results for specific tasks but generally it's better to test your system without tuning. The output transcriptions are saved as "results.mlf".

**For this experiment the language model does not affect our results. This is because the likelihood of one digit following another is completely random. In other tasks consisting of regular conversational speech, the language model plays a much larger role.
- From the directory isip/exp/htk_tutorial/decode type (all one line):
  
  HVite -H ../train/hmm14/macros -H ../train/hmm14/hmmdefs -S ../train/test_list.list -l '*' -i mono_results.mlf -w ../data_preparation/grammar/wdnet -p -0.0 -s 10.0 ../data_preparation/dictionary/dict ../train/monophones1

With our newly created test transcriptions we now use HTK's function, HResults, to evaluate the system. In this step, we pass the true transcriptions (test_trans.mlf), a list of the monophones used for the acoustic models (monophones1 since sp is included), and our generated results (results.mlf). The "???" indicates not to include "SENT-START" or "SENT-END" in the evaluations.
- From the directory isip/exp/htk_tutorial/decode type (all one line from the command line):
  
  HResults -c -h -t -e ??? 'SENT-END' -e ??? 'SENT-START' -I ../data_preparation/trans/test_trans.mlf ../train/monophones1 mono_results.mlf

Word Internal Triphone Decoding Procedure

We follow the exact same procedure for decoding triphone models as we did for the monophones. We now use a few different files though. We now specify that we are using wint models by including config_wint, we pass the updated triphone models in hmm25 as inputs, and finally we use the list of tied state triphones rather than monophones1.
- From the directory isip/exp/htk_tutorial/decode type:
  
  HVite -C ../train/config_wint -H ../train/hmm25/macros -H ../train/hmm25/hmmdefs -S ../train/test_list.list -l '*' -i wint_results.mlf -w ../data_preparation/grammer/wdnet -p -0.0 -s 10.0 ../data_preparation/dictionary/dict ../train/tiedlist
  
  HResults -c -h -t -e ??? 'SENT-END' -e ??? 'SENT-START' -I ../data_preparation/trans/test_trans.mlf ../train/tiedlist wint_results.mlf

Data Preparation

Language Model Preparation

Dictionary Preparation & Phone Lists

Feature Extraction

Transcription Preparation

Training

Monophones: Flat Start

Monophones: Adding sp

Monophones: Realignment

Generating Triphone Lists and Initial Training

Word Internal Triphones: State Tying

Decoding

Creating Test Transcriptions and Final Results