TI Digits Short: Transcription Preparation
Although the transcriptions are already provided when you download the data, HTK, prefers that we use
a master label file (mlf) format for the transcriptions. In this step, we create word level and phone
level transcriptions for both the training and testing data.
Procedure
- Generating word level transcriptions: We need to generate word-level .mlf transcription files for
training and testing (train_trans.mlf & test_trans.mlf) from the files train_trans_fixed.text and
test_trans_fixed.text (these are found in the trans directory). Note that the only difference
between files trans_list_test.text and trans_list_fixed.text is a "*/" at the beginning of the line.
This allows the perl script to read all files:
- From the directory isip/exp/htk_tutorial/data_preparation/trans type:
prompts2mlf train_trans.mlf trans_list_train_fixed.text
- Repeat the above step except for the test data by typing:
prompts2mlf test_trans.mlf trans_list_test_fixed.text
- Now we need to generate phone level transcriptions without sp (train_trans_phones0.mlf) and with
sp (train_trans_phones1.mlf). The file mkphones.led controls how HTK formats the transcriptions.
We use the recently created files dict (under the dictionary directory), train_trans.mlf, and
test_trans.mlf files to do this:
- From the same directory (i.e trans) type:
HLEd -l '*' -d ../dictionary/dict -i train_trans_phones0.mlf mkphones0.led
train_trans.mlf
- From the same directory (i.e trans) type:
HLEd -l '*' -d ../dictionary/dict -i train_trans_phones1.mlf mkphones1.led
train_trans.mlf
Data Preparation
|
Training
|
Decoding
|
|