PERFORMANCE VS. PERPLEXITY
Though perplexity is not the best measure for task complexity, it provides some useful insights:
Corpus
Vocabulary Size
Perplexity
Word Error Rate
TI Digits
11
11
~0.0%
OGI Alphadigits
36
36
8%
Resource Management (RM)
1,000
60
4%
Air Travel Information Service (ATIS)
1,800
12
4%
Wall Street Journal
20,000
200 - 250
15%
Broadcast News
> 80,000
200 - 250
20%
Conversational Speech
> 50,000
100 - 150
30%
Acoustic confusibility of highly probable and interchangeable words most often dominates performance.
WER ~= -12.37 + 6.48*log
2
(Perplexity) [William Fisher, NIST, May 2000]