COMMON EVALUATION


A popular and important way to test algorithm advances is a testing scenario known as a common evaluation. In this approach, users are provided training data typical of the application, development test data to be used to test your training algorithms, and blind evaluation data for which no one knows the answers except the evaluation organization. Researchers submit their final results on the evaluation data and are evaluated by an independent testing organization.

In this pattern recognition course, we created such a scenario for one of the problems on the final exam. What follows is a complete documentation of this experiment including the data, the scoring software, and the results of the students algorithms.

Instructions about the nature of the data and how to process it can be found here. The data consists of ASCII files containing class assignments and the associated vectors. It is a fairly easy format to manipulate. The source of the data is kept secret since we would like to limit the use of domain-specific knowledge. The data is located here.

A program to score your results can be found here. This program simply compares your class assignments to the reference information, and outputs every token that is in error, along with the overall performance. It can be compiled using:

    gcc -o score.exe score.cc -lm

Now for the fun stuff. We have tabulated results for student projects involving this data. It is quite interesting to observe the variation in performance. For example, the same algorithms deliver difference performance; different algorithms deliver the same performance; some algorithms do better on one type of data than the other. This is quite typical in the world of pattern recognition - implementation details can make a big difference for any algorithm. This is why the common evaluation framework is so important.