/ Language Modeling / Tutorial Book / Tutorials / Software / Home

6.3.1 N-Gram Modeling: Overview

Consider the sequence of words, "I am here." The probability of this word sequence can be estimated by measuring its occurrance in a set of training data. To calculate this probability, we need to compute both the number of times "am" is preceded by "I" and the number of times "here" is preceded by "I am."

Clearly, estimating this probability for every possible word sequence is not feasible. A practical approach is to assume this probability depends only on an equivalence class. For example, group all nouns in an equivalence class.

We can simplify this further by considering the following cases:

Unigram: one word sequence
Bigram: two word sequence
Trigram: three word sequence
N-gram: n word sequence

Glossary / Help / Support / Site Map / Contact Us / ISIP Home