two preliminary experiments on Alphadigits
- WER decreased from 44.4% to 37.7% when trained a monophone system where monophones were chosen from a set of cross-word models with 12 mixtures
- achieve a WER 54.2% when trained a monophone system from flat start using 8 mixture components
results are not good due to several reasons
- number of mixtures is low for a monophone system (typical 32)
- not enough iterations (three and two respectively)
- forcing silence between every word
- model mismatch