This experiment contains a simple MLP that can be trained 
and decoded on features. It uses Tensorflow and Python3.

The sections are as follows:

 01: Training
 02: Decoding
 03: Scoring

The three sections can all be found in train.py, decode.py and score.py
respectively. There are two ways to run these scripts, all three scripts can 
be ran seperately in subsequent order or can be ran from a master file, run.sh
which automates the whole process.

===============================================================================
01: Traing a model using train.py 

Train.py takes an output directory, and model filename and a training
data setand trains a model on that set and saves it in the output
directory. The ouput is stored in $DL_EXP/output/models/ and if an
output directory in not created one will be made.

Usage: python train.py mpath data
 
Input: mpath: the pathname of the directory where the output model is stored
       data: the input data list

===============================================================================
02: Decoding using decode.py 

Decode.py takes in an output directory, a model path saved from the
previous section train.py, model filename, and a set to decode. It
will then output predictions for each sample in a hyp file in the
output directory.

Usage: python decode.py odir mfile data

Input: odir: the directory where the hypotheses will be stored
       mfile: input model directory
       data: the input data list to be decoded

===============================================================================
03: Scoring using score.py

Score.py takes a ref file and a hyp file and scores the error rate and 
generates a confusion matrix. 

Usage: python score.py ref hyp 

Inputs: ref: a file containing the reference labels 
	hyp: a matching file containing the hypotheses    

===============================================================================
A typical run will look something like this:

nedc_000_[1]: p
/data/isip/exp/tuh_dpath/exp_0086/
nedc_000_[1]: d
total 122
drwxrwxr-x 5 picone   isip    9 Nov  2 18:16 ./
drwxrwxr-x 4 tug35668 isip    4 Oct 30 08:00 ../
-r--r--r-- 1 picone   isip 1952 Nov  2 18:16 _AAREADME.txt
drwxrwxr-x 2 picone   isip    5 Oct 30 21:07 data/
drwxrwxr-x 4 picone   isip   10 Nov  2 14:54 output/
-r-xr-xr-x 1 picone   isip 3326 Nov  2 14:34 run.sh*
-r-xr-xr-x 1 picone   isip 7841 Nov  2 14:34 run.sh,v*
drwxrwxr-x 2 picone   isip   11 Nov  2 14:51 scripts/
nedc_000_[1]: ./run.sh ./data/2D/train.txt ./data/2D/dev.txt ./data/2D/eval.txt 
Epoch 10/100
10000/10000 - 0s - loss: 0.2302 - accuracy: 0.9025
--
Epoch 20/100
10000/10000 - 0s - loss: 0.2234 - accuracy: 0.9052
--
Epoch 30/100
10000/10000 - 0s - loss: 0.2166 - accuracy: 0.9082
--
Epoch 40/100
10000/10000 - 0s - loss: 0.2122 - accuracy: 0.9101
--
Epoch 50/100
10000/10000 - 0s - loss: 0.2097 - accuracy: 0.9119
--
Epoch 60/100
10000/10000 - 0s - loss: 0.2078 - accuracy: 0.9132
--
Epoch 70/100
10000/10000 - 0s - loss: 0.2065 - accuracy: 0.9141
--
Epoch 80/100
10000/10000 - 0s - loss: 0.2056 - accuracy: 0.9141
--
Epoch 90/100
10000/10000 - 0s - loss: 0.2049 - accuracy: 0.9139
--
Epoch 100/100
10000/10000 - 0s - loss: 0.2045 - accuracy: 0.9139
... finished training on ./data/2D/train.txt ...
... starting evaluation of ./data/2D/train.txt ...
decoding 1000 out of 10000
decoding 2000 out of 10000
decoding 3000 out of 10000
decoding 4000 out of 10000
decoding 5000 out of 10000
decoding 6000 out of 10000
decoding 7000 out of 10000
decoding 8000 out of 10000
decoding 9000 out of 10000
decoding 10000 out of 10000
... finished evaluation of ./data/2D/train.txt ...
... starting evaluation of ./data/2D/dev.txt ...
decoding  100 out of 2000
decoding  200 out of 2000
decoding  300 out of 2000
decoding  400 out of 2000
decoding  500 out of 2000
decoding  600 out of 2000
decoding  700 out of 2000
decoding  800 out of 2000
decoding  900 out of 2000
decoding 1000 out of 2000
decoding 1100 out of 2000
decoding 1200 out of 2000
decoding 1300 out of 2000
decoding 1400 out of 2000
decoding 1500 out of 2000
decoding 1600 out of 2000
decoding 1700 out of 2000
decoding 1800 out of 2000
decoding 1900 out of 2000
decoding 2000 out of 2000
... finished evaluation of ./data/2D/dev.txt ...
... starting evaluation of ./data/2D/eval.txt ...
decoding  100 out of 2000
decoding  200 out of 2000
decoding  300 out of 2000
decoding  400 out of 2000
decoding  500 out of 2000
decoding  600 out of 2000
decoding  700 out of 2000
decoding  800 out of 2000
decoding  900 out of 2000
decoding 1000 out of 2000
decoding 1100 out of 2000
decoding 1200 out of 2000
decoding 1300 out of 2000
decoding 1400 out of 2000
decoding 1500 out of 2000
decoding 1600 out of 2000
decoding 1700 out of 2000
decoding 1800 out of 2000
decoding 1900 out of 2000
decoding 2000 out of 2000
... finished evaluation of ./data/2D/eval.txt ...
... starting scoring of ./data/2D/train.txt ...
... finished scoring of ./data/2D/train.txt ...
... starting scoring of ./data/2D/dev.txt ...
... finished scoring of ./data/2D/dev.txt ...
 
===== displaying results =====
 TRAINING DATA RESULTS:
 r/h:    h[0]    h[1]
 r[0]:    4492     508
 r[1]:     355    4645
error rate =     8.6300%
 
 TEST DATA RESULTS:
 r/h:    h[0]    h[1]
 r[0]:     894     106
 r[1]:      76     924
error rate =     9.1000%
======= end of results =======

Once you run this script, you will see these directories:

nedc_000_[1]: p
/data/isip/exp/tuh_dpath/exp_0074/v1.0
nedc_000_[1]: d
total 133
drwxrwxr-x 5 picone   isip    9 Nov  2 18:40 ./
drwxrwxr-x 4 tug35668 isip    4 Oct 30 08:00 ../
-rw-r--r-- 1 picone   isip 6114 Nov  2 18:40 _AAREADME.txt
-r--r--r-- 1 picone   isip 2193 Nov  2 18:11 _AAREADME.txt,v
drwxrwxr-x 2 picone   isip    5 Oct 30 21:07 data/
drwxrwxr-x 4 picone   isip   10 Nov  2 18:18 output/
-r-xr-xr-x 1 picone   isip 3326 Nov  2 14:34 run.sh*
-r-xr-xr-x 1 picone   isip 7841 Nov  2 14:34 run.sh,v*
drwxrwxr-x 2 picone   isip   11 Nov  2 18:37 scripts/

nedc_000_[1]: d data
total 5590
drwxrwxr-x 2 picone   isip       5 Oct 30 21:07 ./
drwxrwxr-x 5 picone   isip       9 Nov  2 18:40 ../
-rw-rw-r-- 1 picone   isip  521347 Oct 30 08:00 dev_set.txt
-rw-rw-r-- 1 tug35668 isip  287596 Oct 30 21:07 eval_set.txt
-rw-r--r-- 1 picone   isip 4715794 Oct 30 08:00 train_set.txt

nedc_000_[1]: d output
total 3034
drwxrwxr-x 4 picone isip      10 Nov  2 18:18 ./
drwxrwxr-x 5 picone isip       9 Nov  2 18:40 ../
drwxrwxr-x 2 picone isip       3 Nov  2 18:17 00_train/
-rw-rw-r-- 1 picone isip 2299392 Nov  2 18:17 00_train.log
-rw-rw-r-- 1 picone isip   54444 Nov  2 18:18 01_decode_dev.log
-rw-rw-r-- 1 picone isip   30030 Nov  2 18:18 01_decode_eval.log
-rw-rw-r-- 1 picone isip  520209 Nov  2 18:18 01_decode_train.log
drwxrwxr-x 2 picone isip       5 Nov  2 18:18 01_hyp/
-rw-rw-r-- 1 picone isip      77 Nov  2 18:18 02_results_dev.dat
-rw-rw-r-- 1 picone isip      77 Nov  2 18:18 02_results_train.dat

The directory "data" contains the input data. The directory "output" contains
the output hypotheses and the logs generated when the job is run.

In the output directory, The files "*_results.dat" contain the output
of the scoring program. The directory "00_train" contains the output
model. The directory "01_hyp" contains the hypotheses.

Finally, the source code is in scripts. You will want to change train.py
and decode.py to introduce new algorithms.