This experiment contains a simple MLP that can be trained and decoded on features. It uses Tensorflow and Python3. The sections are as follows: 01: Training 02: Decoding 03: Scoring The three sections can all be found in train.py, decode.py and score.py respectively. There are two ways to run these scripts, all three scripts can be ran seperately in subsequent order or can be ran from a master file, run.sh which automates the whole process. =============================================================================== 01: Traing a model using train.py Train.py takes an output directory, and model filename and a training data setand trains a model on that set and saves it in the output directory. The ouput is stored in $DL_EXP/output/models/ and if an output directory in not created one will be made. Usage: python train.py mpath data Input: mpath: the pathname of the directory where the output model is stored data: the input data list =============================================================================== 02: Decoding using decode.py Decode.py takes in an output directory, a model path saved from the previous section train.py, model filename, and a set to decode. It will then output predictions for each sample in a hyp file in the output directory. Usage: python decode.py odir mfile data Input: odir: the directory where the hypotheses will be stored mfile: input model directory data: the input data list to be decoded =============================================================================== 03: Scoring using score.py Score.py takes a ref file and a hyp file and scores the error rate and generates a confusion matrix. Usage: python score.py ref hyp Inputs: ref: a file containing the reference labels hyp: a matching file containing the hypotheses =============================================================================== A typical run will look something like this: nedc_000_[1]: p /data/isip/exp/tuh_dpath/exp_0086/ nedc_000_[1]: d total 122 drwxrwxr-x 5 picone isip 9 Nov 2 18:16 ./ drwxrwxr-x 4 tug35668 isip 4 Oct 30 08:00 ../ -r--r--r-- 1 picone isip 1952 Nov 2 18:16 _AAREADME.txt drwxrwxr-x 2 picone isip 5 Oct 30 21:07 data/ drwxrwxr-x 4 picone isip 10 Nov 2 14:54 output/ -r-xr-xr-x 1 picone isip 3326 Nov 2 14:34 run.sh* -r-xr-xr-x 1 picone isip 7841 Nov 2 14:34 run.sh,v* drwxrwxr-x 2 picone isip 11 Nov 2 14:51 scripts/ nedc_000_[1]: ./run.sh ./data/2D/train.txt ./data/2D/dev.txt ./data/2D/eval.txt Epoch 10/100 10000/10000 - 0s - loss: 0.2302 - accuracy: 0.9025 -- Epoch 20/100 10000/10000 - 0s - loss: 0.2234 - accuracy: 0.9052 -- Epoch 30/100 10000/10000 - 0s - loss: 0.2166 - accuracy: 0.9082 -- Epoch 40/100 10000/10000 - 0s - loss: 0.2122 - accuracy: 0.9101 -- Epoch 50/100 10000/10000 - 0s - loss: 0.2097 - accuracy: 0.9119 -- Epoch 60/100 10000/10000 - 0s - loss: 0.2078 - accuracy: 0.9132 -- Epoch 70/100 10000/10000 - 0s - loss: 0.2065 - accuracy: 0.9141 -- Epoch 80/100 10000/10000 - 0s - loss: 0.2056 - accuracy: 0.9141 -- Epoch 90/100 10000/10000 - 0s - loss: 0.2049 - accuracy: 0.9139 -- Epoch 100/100 10000/10000 - 0s - loss: 0.2045 - accuracy: 0.9139 ... finished training on ./data/2D/train.txt ... ... starting evaluation of ./data/2D/train.txt ... decoding 1000 out of 10000 decoding 2000 out of 10000 decoding 3000 out of 10000 decoding 4000 out of 10000 decoding 5000 out of 10000 decoding 6000 out of 10000 decoding 7000 out of 10000 decoding 8000 out of 10000 decoding 9000 out of 10000 decoding 10000 out of 10000 ... finished evaluation of ./data/2D/train.txt ... ... starting evaluation of ./data/2D/dev.txt ... decoding 100 out of 2000 decoding 200 out of 2000 decoding 300 out of 2000 decoding 400 out of 2000 decoding 500 out of 2000 decoding 600 out of 2000 decoding 700 out of 2000 decoding 800 out of 2000 decoding 900 out of 2000 decoding 1000 out of 2000 decoding 1100 out of 2000 decoding 1200 out of 2000 decoding 1300 out of 2000 decoding 1400 out of 2000 decoding 1500 out of 2000 decoding 1600 out of 2000 decoding 1700 out of 2000 decoding 1800 out of 2000 decoding 1900 out of 2000 decoding 2000 out of 2000 ... finished evaluation of ./data/2D/dev.txt ... ... starting evaluation of ./data/2D/eval.txt ... decoding 100 out of 2000 decoding 200 out of 2000 decoding 300 out of 2000 decoding 400 out of 2000 decoding 500 out of 2000 decoding 600 out of 2000 decoding 700 out of 2000 decoding 800 out of 2000 decoding 900 out of 2000 decoding 1000 out of 2000 decoding 1100 out of 2000 decoding 1200 out of 2000 decoding 1300 out of 2000 decoding 1400 out of 2000 decoding 1500 out of 2000 decoding 1600 out of 2000 decoding 1700 out of 2000 decoding 1800 out of 2000 decoding 1900 out of 2000 decoding 2000 out of 2000 ... finished evaluation of ./data/2D/eval.txt ... ... starting scoring of ./data/2D/train.txt ... ... finished scoring of ./data/2D/train.txt ... ... starting scoring of ./data/2D/dev.txt ... ... finished scoring of ./data/2D/dev.txt ... ===== displaying results ===== TRAINING DATA RESULTS: r/h: h[0] h[1] r[0]: 4492 508 r[1]: 355 4645 error rate = 8.6300% TEST DATA RESULTS: r/h: h[0] h[1] r[0]: 894 106 r[1]: 76 924 error rate = 9.1000% ======= end of results ======= Once you run this script, you will see these directories: nedc_000_[1]: p /data/isip/exp/tuh_dpath/exp_0074/v1.0 nedc_000_[1]: d total 133 drwxrwxr-x 5 picone isip 9 Nov 2 18:40 ./ drwxrwxr-x 4 tug35668 isip 4 Oct 30 08:00 ../ -rw-r--r-- 1 picone isip 6114 Nov 2 18:40 _AAREADME.txt -r--r--r-- 1 picone isip 2193 Nov 2 18:11 _AAREADME.txt,v drwxrwxr-x 2 picone isip 5 Oct 30 21:07 data/ drwxrwxr-x 4 picone isip 10 Nov 2 18:18 output/ -r-xr-xr-x 1 picone isip 3326 Nov 2 14:34 run.sh* -r-xr-xr-x 1 picone isip 7841 Nov 2 14:34 run.sh,v* drwxrwxr-x 2 picone isip 11 Nov 2 18:37 scripts/ nedc_000_[1]: d data total 5590 drwxrwxr-x 2 picone isip 5 Oct 30 21:07 ./ drwxrwxr-x 5 picone isip 9 Nov 2 18:40 ../ -rw-rw-r-- 1 picone isip 521347 Oct 30 08:00 dev_set.txt -rw-rw-r-- 1 tug35668 isip 287596 Oct 30 21:07 eval_set.txt -rw-r--r-- 1 picone isip 4715794 Oct 30 08:00 train_set.txt nedc_000_[1]: d output total 3034 drwxrwxr-x 4 picone isip 10 Nov 2 18:18 ./ drwxrwxr-x 5 picone isip 9 Nov 2 18:40 ../ drwxrwxr-x 2 picone isip 3 Nov 2 18:17 00_train/ -rw-rw-r-- 1 picone isip 2299392 Nov 2 18:17 00_train.log -rw-rw-r-- 1 picone isip 54444 Nov 2 18:18 01_decode_dev.log -rw-rw-r-- 1 picone isip 30030 Nov 2 18:18 01_decode_eval.log -rw-rw-r-- 1 picone isip 520209 Nov 2 18:18 01_decode_train.log drwxrwxr-x 2 picone isip 5 Nov 2 18:18 01_hyp/ -rw-rw-r-- 1 picone isip 77 Nov 2 18:18 02_results_dev.dat -rw-rw-r-- 1 picone isip 77 Nov 2 18:18 02_results_train.dat The directory "data" contains the input data. The directory "output" contains the output hypotheses and the logs generated when the job is run. In the output directory, The files "*_results.dat" contain the output of the scoring program. The directory "00_train" contains the output model. The directory "01_hyp" contains the hypotheses. Finally, the source code is in scripts. You will want to change train.py and decode.py to introduce new algorithms.