THE BACKOFF MODEL: A
FLEXIBLE TRADE-OFF BETWEEN ACCURACY AND COMPLEXITY
- Backoff smoothing: Approximate the probability of an unobserved
N-gram using more frequently occuring lower order N-grams
- If an N-gram count is zero, we approximate its probability using
a lower order N-gram.
- The scaling factor is chosen to make the conditional distribution
sum to one.
- Extremely popular for N-gram modeling in speech recognition because
you can control complexity as well as generalization.