The data for the final exam can be found at: http://www.isip.msstate.edu/publications/courses/ece_8990_pr/exams/1999/data There are the following files: train/train_1.data => training data for set 1 train/train_2.data => training data for set 2 test/test_1.data => development testing data for set 1 test/test_1.class => answers for testing data for set 1 test/test_2.data => development testing data for set 2 test/test_2.class => answers for testing data set 2 eval/eval_1.data => blind evaluation data for set 1 eval/eval_2.data => blind evaluation data for set 2 There is also a scoring program, score.cc, and a binary for Sun Sparcs (score.exe). It is run as follows: score.exe test_1.class test_1.example and outputs the following: error on token no. 19: ref = 11, hyp = 10 error on token no. 375: ref = 9, hyp = 8 (377 correct out of 379 tokens; percent error = 0.53%) Each data file consists of a class tag, followed by a vector of data: isip02_[2]: m test_1.data 11: -3.189 2.620 -0.833 0.450 0.009 0.444 -0.348 0.126 -0.614 0.081 Students must email me their hypotheses for the two evaluation sets, and we will score them using the answers (not available to the students). Development of an algorithm should do the following: - train on train_1.data - evaluate the models on test_1.data (using score and test_1.class) - for the final evaluation, train on both train_1.data and test_1.data, and evaluate eval_1.data This should be done for set 1 and set 2. Here is a brief description of the data. Data set 1: static classification 10 dimension of vectors 11 classes 83 eval set vectors 379 development set vectors 528 training set vectors Data set 2: temporal modeling 39 dimension of vectors 5 classes 225 eval set vectors (sets of 5 vectors for each class) 350 development set vectors (sets of 5 vectors for each class) 925 training set vectors (sets of 5 vectors for each class) On set no. 2, the data can be assumed to be sequences of 5 vectors (every group of 5 vectors has the same class assignment and can be thought of as having occurred sequentially in time). The motivated student might try to do some temporal modeling of the data. Let the games begin. If you have any questions, or would like your data evaluated, send mail to help@isip.msstate.edu. Regards, Joe Picone