Overview Downloads  Tutorials
HTK Tutorials
Tutorials

TI Digits Short: Transcription Preparation

Although the transcriptions are already provided when you download the data, HTK, prefers that we use a master label file (mlf) format for the transcriptions. In this step, we create word level and phone level transcriptions for both the training and testing data.

Procedure


  1. Generating word level transcriptions: We need to generate word-level .mlf transcription files for training and testing (train_trans.mlf & test_trans.mlf) from the files train_trans_fixed.text and test_trans_fixed.text (these are found in the trans directory). Note that the only difference between files trans_list_test.text and trans_list_fixed.text is a "*/" at the beginning of the line. This allows the perl script to read all files:

    • From the directory isip/exp/htk_tutorial/data_preparation/trans type:
      prompts2mlf train_trans.mlf trans_list_train_fixed.text

    • Repeat the above step except for the test data by typing:
      prompts2mlf test_trans.mlf trans_list_test_fixed.text

  2. Now we need to generate phone level transcriptions without sp (train_trans_phones0.mlf) and with sp (train_trans_phones1.mlf). The file mkphones.led controls how HTK formats the transcriptions. We use the recently created files dict (under the dictionary directory), train_trans.mlf, and test_trans.mlf files to do this:

    • From the same directory (i.e trans) type:
      HLEd -l '*' -d ../dictionary/dict -i train_trans_phones0.mlf mkphones0.led train_trans.mlf
    • From the same directory (i.e trans) type:
      HLEd -l '*' -d ../dictionary/dict -i train_trans_phones1.mlf mkphones1.led train_trans.mlf


Data Preparation Training Decoding