BN RECOGNITION SYSTEMS - III

LIMSI Philips SRI
Front-end
  • 39 features (MFCC, energy)
  • 33 features (MFCC, energy)
  • VTN
  • LDA + normalization
  • 39 features (MFCC, energy)
  • Segmentation
  • GMMs with 64 Gaussians
  • Viterbi decoding
  • Agglomerative clustering
  • Kullback-Leibler distance
  • Merge adjacent segments
  • Phone-tied GMMs
  • Viterbi search
  • Acoustic Model
  • Filler words
  • Gender-independent models
  • Position dependent
  • Xword triphones
  • Laplacian densities
  • DT clustering for triphones
  • Gender-independent models
  • Focus-independent training
  • Gender-dependent
  • Genones
  • Focus-specific adaptn.
  • GMS algorithm
  • Language Model
  • CU-CMU SLM toolkit
  • 65K words
  • 1st pass - bigrams
  • filler words in trigrams
  • 75K vocabulary
  • Phrase models
  • Adapted remote corpora
  • 48K lexicon
  • 1st pass - bigrams
  • 2nd pass - trigram lattice
  • Final pass - 5-grams
  • Decoder
  • Word-graph, bigrams
  • Trigram pass
  • Cluster-based MLLR
  • All passes Xword
  • Multipass wordgraph
  • MLLR + Xword trigram
  • 1st pass WI, ML adapt
  • 2nd pass WI, lattice adapt
  • FB pass, N-best lists
  • Xword models adapt
  • N-best rescoring
  • Performance
  • 18.5% WER
  • 23.1% WER
  • 26.7% WER