This experiment contains a simple MLP that can be trained and decoded on features. It uses PyTorch and Python3. The sections are as follows: 01: Training 02: Decoding 03: Scoring The three sections can all be found in train.py, decode.py and score.py respectively. There are two ways to run these scripts, all three scripts can be ran seperately in subsequent order or can be ran from a master file, run.sh which automates the whole process. =============================================================================== 01: Traing a model using train.py Train.py takes an output directory, and model filename and a training data setand trains a model on that set and saves it in the output directory. The ouput is stored in $DL_EXP/output/models/ and if an output directory in not created one will be made. Usage: python train.py mpath data Input: mpath: the pathname of a file where the output model is stored data: the input data list =============================================================================== 02: Decoding using decode.py Decode.py takes in an output directory, a model path saved from the previous section train.py, model filename, and a set to decode. It will then output predictions for each sample in a hyp file in the output directory. Usage: python decode.py odir mfile data Input: odir: the directory where the hypotheses will be stored mfile: input model file data: the input data list to be decoded =============================================================================== 03: Scoring using score.py Score.py takes a ref file and a hyp file and scores the error rate and generates a confusion matrix. Usage: python score.py ref hyp Inputs: ref: a file containing the reference labels hyp: a matching file containing the hypotheses =============================================================================== A typical run will look something like this: nedc_000_[1]: p /data/isip/exp/tuh_dpath/exp_0074/v1.0 nedc_000_[1]: d total 122 drwxrwxr-x 5 picone isip 9 Nov 2 18:16 ./ drwxrwxr-x 4 tug35668 isip 4 Oct 30 08:00 ../ -r--r--r-- 1 picone isip 1952 Nov 2 18:16 _AAREADME.txt drwxrwxr-x 2 picone isip 5 Oct 30 21:07 data/ drwxrwxr-x 4 picone isip 10 Nov 2 14:54 output/ -r-xr-xr-x 1 picone isip 3326 Nov 2 14:34 run.sh* -r-xr-xr-x 1 picone isip 7841 Nov 2 14:34 run.sh,v* drwxrwxr-x 2 picone isip 11 Nov 2 14:51 scripts/ nedc_000_[1]: ./run.sh data/train_set.txt data/dev_set.txt data/eval_set.txt ... starting training on data/train_set.txt ... Epoch [10/100], Step[1/526], Loss: 0.6605 Epoch [20/100], Step[1/526], Loss: 0.6637 Epoch [30/100], Step[1/526], Loss: 0.6666 Epoch [40/100], Step[1/526], Loss: 0.6527 Epoch [50/100], Step[1/526], Loss: 0.6567 Epoch [60/100], Step[1/526], Loss: 0.6588 Epoch [70/100], Step[1/526], Loss: 0.6519 Epoch [80/100], Step[1/526], Loss: 0.6573 Epoch [90/100], Step[1/526], Loss: 0.6564 Epoch [100/100], Step[1/526], Loss: 0.6547 ... finished training on data/train_set.txt ... ... starting evaluation of data/train_set.txt ... decoding 1000 out of 18936 decoding 2000 out of 18936 decoding 3000 out of 18936 decoding 4000 out of 18936 decoding 5000 out of 18936 decoding 6000 out of 18936 decoding 7000 out of 18936 decoding 8000 out of 18936 decoding 9000 out of 18936 decoding 10000 out of 18936 decoding 11000 out of 18936 decoding 12000 out of 18936 decoding 13000 out of 18936 decoding 14000 out of 18936 decoding 15000 out of 18936 decoding 16000 out of 18936 decoding 17000 out of 18936 decoding 18000 out of 18936 ... finished evaluation of data/train_set.txt ... ... starting evaluation of data/dev_set.txt ... decoding 100 out of 2094 decoding 200 out of 2094 decoding 300 out of 2094 decoding 400 out of 2094 decoding 500 out of 2094 decoding 600 out of 2094 decoding 700 out of 2094 decoding 800 out of 2094 decoding 900 out of 2094 decoding 1000 out of 2094 decoding 1100 out of 2094 decoding 1200 out of 2094 decoding 1300 out of 2094 decoding 1400 out of 2094 decoding 1500 out of 2094 decoding 1600 out of 2094 decoding 1700 out of 2094 decoding 1800 out of 2094 decoding 1900 out of 2094 decoding 2000 out of 2094 ... finished evaluation of data/dev_set.txt ... ... starting evaluation of data/eval_set.txt ... decoding 100 out of 1155 decoding 200 out of 1155 decoding 300 out of 1155 decoding 400 out of 1155 decoding 500 out of 1155 decoding 600 out of 1155 decoding 700 out of 1155 decoding 800 out of 1155 decoding 900 out of 1155 decoding 1000 out of 1155 decoding 1100 out of 1155 ... finished evaluation of data/eval_set.txt ... ... starting scoring of data/train_set.txt ... ... finished scoring of data/train_set.txt ... ... starting scoring of data/dev_set.txt ... ... finished scoring of data/dev_set.txt ... ===== displaying results ===== TRAINING DATA RESULTS: r/h: h[0] h[1] r[0]: 6363 4892 r[1]: 2981 4700 error rate = 41.5769% TEST DATA RESULTS: r/h: h[0] h[1] r[0]: 762 484 r[1]: 285 563 error rate = 36.7240% ======= end of results ======= Once you run this script, you will see these directories: nedc_000_[1]: p /data/isip/exp/tuh_dpath/exp_0074/v1.0 nedc_000_[1]: d total 133 drwxrwxr-x 5 picone isip 9 Nov 2 18:40 ./ drwxrwxr-x 4 tug35668 isip 4 Oct 30 08:00 ../ -rw-r--r-- 1 picone isip 6114 Nov 2 18:40 _AAREADME.txt -r--r--r-- 1 picone isip 2193 Nov 2 18:11 _AAREADME.txt,v drwxrwxr-x 2 picone isip 5 Oct 30 21:07 data/ drwxrwxr-x 4 picone isip 10 Nov 2 18:18 output/ -r-xr-xr-x 1 picone isip 3326 Nov 2 14:34 run.sh* -r-xr-xr-x 1 picone isip 7841 Nov 2 14:34 run.sh,v* drwxrwxr-x 2 picone isip 11 Nov 2 18:37 scripts/ nedc_000_[1]: d data total 5590 drwxrwxr-x 2 picone isip 5 Oct 30 21:07 ./ drwxrwxr-x 5 picone isip 9 Nov 2 18:40 ../ -rw-rw-r-- 1 picone isip 521347 Oct 30 08:00 dev_set.txt -rw-rw-r-- 1 tug35668 isip 287596 Oct 30 21:07 eval_set.txt -rw-r--r-- 1 picone isip 4715794 Oct 30 08:00 train_set.txt nedc_000_[1]: d output total 3034 drwxrwxr-x 4 picone isip 10 Nov 2 18:18 ./ drwxrwxr-x 5 picone isip 9 Nov 2 18:40 ../ drwxrwxr-x 2 picone isip 3 Nov 2 18:17 00_train/ -rw-rw-r-- 1 picone isip 2299392 Nov 2 18:17 00_train.log -rw-rw-r-- 1 picone isip 54444 Nov 2 18:18 01_decode_dev.log -rw-rw-r-- 1 picone isip 30030 Nov 2 18:18 01_decode_eval.log -rw-rw-r-- 1 picone isip 520209 Nov 2 18:18 01_decode_train.log drwxrwxr-x 2 picone isip 5 Nov 2 18:18 01_hyp/ -rw-rw-r-- 1 picone isip 77 Nov 2 18:18 02_results_dev.dat -rw-rw-r-- 1 picone isip 77 Nov 2 18:18 02_results_train.dat The directory "data" contains the input data. The directory "output" contains the output hypotheses and the logs generated when the job is run. In the output directory, The files "*_results.dat" contain the output of the scoring program. The directory "00_train" contains the output model. The directory "01_hyp" contains the hypotheses. Finally, the source code is in scripts. You will want to change train.py and decode.py to introduce new algorithms.