This experiment contains a simple MLP that can be trained 
and decoded on features. It uses PyTorch and Python3.

The sections are as follows:

 01: Training
 02: Decoding
 03: Scoring

The three sections can all be found in train.py, decode.py and score.py
respectively. There are two ways to run these scripts, all three scripts can 
be ran seperately in subsequent order or can be ran from a master file, run.sh
which automates the whole process.

===============================================================================
01: Traing a model using train.py 

Train.py takes an output directory, and model filename and a training
data setand trains a model on that set and saves it in the output
directory. The ouput is stored in $DL_EXP/output/models/ and if an
output directory in not created one will be made.

Usage: python train.py mpath data
 
Input: mpath: the pathname of a file where the output model is stored
       data: the input data list

===============================================================================
02: Decoding using decode.py 

Decode.py takes in an output directory, a model path saved from the
previous section train.py, model filename, and a set to decode. It
will then output predictions for each sample in a hyp file in the
output directory.

Usage: python decode.py odir mfile data

Input: odir: the directory where the hypotheses will be stored
       mfile: input model file
       data: the input data list to be decoded

===============================================================================
03: Scoring using score.py

Score.py takes a ref file and a hyp file and scores the error rate and 
generates a confusion matrix. 

Usage: python score.py ref hyp 

Inputs: ref: a file containing the reference labels 
	hyp: a matching file containing the hypotheses    

===============================================================================
A typical run will look something like this:

nedc_000_[1]: p
/data/isip/exp/tuh_dpath/exp_0074/v1.0
nedc_000_[1]: d
total 122
drwxrwxr-x 5 picone   isip    9 Nov  2 18:16 ./
drwxrwxr-x 4 tug35668 isip    4 Oct 30 08:00 ../
-r--r--r-- 1 picone   isip 1952 Nov  2 18:16 _AAREADME.txt
drwxrwxr-x 2 picone   isip    5 Oct 30 21:07 data/
drwxrwxr-x 4 picone   isip   10 Nov  2 14:54 output/
-r-xr-xr-x 1 picone   isip 3326 Nov  2 14:34 run.sh*
-r-xr-xr-x 1 picone   isip 7841 Nov  2 14:34 run.sh,v*
drwxrwxr-x 2 picone   isip   11 Nov  2 14:51 scripts/
nedc_000_[1]: ./run.sh data/train_set.txt data/dev_set.txt data/eval_set.txt 
... starting training on data/train_set.txt ...
Epoch [10/100], Step[1/526], Loss: 0.6605
Epoch [20/100], Step[1/526], Loss: 0.6637
Epoch [30/100], Step[1/526], Loss: 0.6666
Epoch [40/100], Step[1/526], Loss: 0.6527
Epoch [50/100], Step[1/526], Loss: 0.6567
Epoch [60/100], Step[1/526], Loss: 0.6588
Epoch [70/100], Step[1/526], Loss: 0.6519
Epoch [80/100], Step[1/526], Loss: 0.6573
Epoch [90/100], Step[1/526], Loss: 0.6564
Epoch [100/100], Step[1/526], Loss: 0.6547
... finished training on data/train_set.txt ...
... starting evaluation of data/train_set.txt ...
decoding 1000 out of 18936
decoding 2000 out of 18936
decoding 3000 out of 18936
decoding 4000 out of 18936
decoding 5000 out of 18936
decoding 6000 out of 18936
decoding 7000 out of 18936
decoding 8000 out of 18936
decoding 9000 out of 18936
decoding 10000 out of 18936
decoding 11000 out of 18936
decoding 12000 out of 18936
decoding 13000 out of 18936
decoding 14000 out of 18936
decoding 15000 out of 18936
decoding 16000 out of 18936
decoding 17000 out of 18936
decoding 18000 out of 18936
... finished evaluation of data/train_set.txt ...
... starting evaluation of data/dev_set.txt ...
decoding  100 out of 2094
decoding  200 out of 2094
decoding  300 out of 2094
decoding  400 out of 2094
decoding  500 out of 2094
decoding  600 out of 2094
decoding  700 out of 2094
decoding  800 out of 2094
decoding  900 out of 2094
decoding 1000 out of 2094
decoding 1100 out of 2094
decoding 1200 out of 2094
decoding 1300 out of 2094
decoding 1400 out of 2094
decoding 1500 out of 2094
decoding 1600 out of 2094
decoding 1700 out of 2094
decoding 1800 out of 2094
decoding 1900 out of 2094
decoding 2000 out of 2094
... finished evaluation of data/dev_set.txt ...
... starting evaluation of data/eval_set.txt ...
decoding  100 out of 1155
decoding  200 out of 1155
decoding  300 out of 1155
decoding  400 out of 1155
decoding  500 out of 1155
decoding  600 out of 1155
decoding  700 out of 1155
decoding  800 out of 1155
decoding  900 out of 1155
decoding 1000 out of 1155
decoding 1100 out of 1155
... finished evaluation of data/eval_set.txt ...
... starting scoring of data/train_set.txt ...
... finished scoring of data/train_set.txt ...
... starting scoring of data/dev_set.txt ...
... finished scoring of data/dev_set.txt ...
 
===== displaying results =====
 TRAINING DATA RESULTS:
r/h:    h[0]  h[1]
 r[0]:  6363  4892
 r[1]:  2981  4700
error rate  =    41.5769%
 
 TEST DATA RESULTS:
r/h:    h[0]  h[1]
 r[0]:   762   484
 r[1]:   285   563
error rate  =    36.7240%
======= end of results =======

Once you run this script, you will see these directories:

nedc_000_[1]: p
/data/isip/exp/tuh_dpath/exp_0074/v1.0
nedc_000_[1]: d
total 133
drwxrwxr-x 5 picone   isip    9 Nov  2 18:40 ./
drwxrwxr-x 4 tug35668 isip    4 Oct 30 08:00 ../
-rw-r--r-- 1 picone   isip 6114 Nov  2 18:40 _AAREADME.txt
-r--r--r-- 1 picone   isip 2193 Nov  2 18:11 _AAREADME.txt,v
drwxrwxr-x 2 picone   isip    5 Oct 30 21:07 data/
drwxrwxr-x 4 picone   isip   10 Nov  2 18:18 output/
-r-xr-xr-x 1 picone   isip 3326 Nov  2 14:34 run.sh*
-r-xr-xr-x 1 picone   isip 7841 Nov  2 14:34 run.sh,v*
drwxrwxr-x 2 picone   isip   11 Nov  2 18:37 scripts/

nedc_000_[1]: d data
total 5590
drwxrwxr-x 2 picone   isip       5 Oct 30 21:07 ./
drwxrwxr-x 5 picone   isip       9 Nov  2 18:40 ../
-rw-rw-r-- 1 picone   isip  521347 Oct 30 08:00 dev_set.txt
-rw-rw-r-- 1 tug35668 isip  287596 Oct 30 21:07 eval_set.txt
-rw-r--r-- 1 picone   isip 4715794 Oct 30 08:00 train_set.txt

nedc_000_[1]: d output
total 3034
drwxrwxr-x 4 picone isip      10 Nov  2 18:18 ./
drwxrwxr-x 5 picone isip       9 Nov  2 18:40 ../
drwxrwxr-x 2 picone isip       3 Nov  2 18:17 00_train/
-rw-rw-r-- 1 picone isip 2299392 Nov  2 18:17 00_train.log
-rw-rw-r-- 1 picone isip   54444 Nov  2 18:18 01_decode_dev.log
-rw-rw-r-- 1 picone isip   30030 Nov  2 18:18 01_decode_eval.log
-rw-rw-r-- 1 picone isip  520209 Nov  2 18:18 01_decode_train.log
drwxrwxr-x 2 picone isip       5 Nov  2 18:18 01_hyp/
-rw-rw-r-- 1 picone isip      77 Nov  2 18:18 02_results_dev.dat
-rw-rw-r-- 1 picone isip      77 Nov  2 18:18 02_results_train.dat

The directory "data" contains the input data. The directory "output" contains
the output hypotheses and the logs generated when the job is run.

In the output directory, The files "*_results.dat" contain the output
of the scoring program. The directory "00_train" contains the output
model. The directory "01_hyp" contains the hypotheses.

Finally, the source code is in scripts. You will want to change train.py
and decode.py to introduce new algorithms.