COMMON EVALUATION
A popular and important way to test algorithm advances is
a testing scenario known as a common evaluation.
In this approach, users are provided training data typical
of the application, development test data to be used
to test your training algorithms, and blind evaluation data
for which no one knows the answers except the evaluation organization.
Researchers submit their final results on the evaluation data
and are evaluated by an independent testing organization.
In this pattern recognition course, we created such a scenario for
one of the problems on the final exam. What follows is a complete
documentation of this experiment including the data,
the scoring software, and the results of the students algorithms.
Instructions
about the nature of the data and how to process it can be found
here. The data consists of
ASCII files containing class assignments and the associated vectors.
It is a fairly easy format to manipulate. The source of the data
is kept secret since we would like to limit the use of domain-specific
knowledge. The data is located
here.
A program to
score
your results can be found
here.
This program simply
compares your class assignments to the reference information,
and outputs every token that is in error, along with the overall
performance. It can be compiled using:
gcc -o score.exe score.cc -lm
Now for the fun stuff.
We have tabulated
results
for student projects involving this data. It is quite interesting
to observe the variation in performance. For example, the same
algorithms deliver difference performance; different algorithms
deliver the same performance; some algorithms do better on one
type of data than the other. This is quite typical in the world
of pattern recognition - implementation details can make a big difference
for any algorithm. This is why the common evaluation framework
is so important.