Point Size of Corpus

  • limited words in CMU dictionary
  • impossible to include every word, especially proper nouns and unusual words
  • larger size of database provides better coverage but takes up a lot more time

  • Point Sparse Data

  • only a small percentage of all possible triphones are actually used
  • similar phones are merged together