Typically, researchers use N-gram based language models which describe the likelihood of a word being spoken based on the prior N words. For example, if the speech recognizer creates two equally possible transcriptions for an utterance, let's say "fan" and "can", we can look at the prior word to help us make the best decision. If the previous word was "garbage," then we can assume that "can" is a better choice, i.e. "garbage can" is a much more likely phrase than "garbage fan." N-grams are a very useful tool when it comes to natural language processing (NLP). However, this particular set of data consists of numbers and nothing else, so the likelihood of the number 1 being followed by the number 3 is the same as 1 being follwed by 8. Thus there isn't really a need for a language model for this experiment. However, HTK does not allow us to omit a language model when decoding so we'll simply create a wordnet to take its place. Procedure
|