HW 13: N-GRAM LANGUAGE MODELING SMOOTHING
- As a continuation of homework assignment no. 12, consider the
unigram distribution computed from the SWB data.
Implement Good-Turing smoothing and compare this result to
simple smoothing using entropy.
- Smooth the bigram distribution using Katz smoothing, and compare
this result to the unsmoothed distribution.
- Group all uppercase words into one equivalence class; sort all
other words into 26 equivalence classes by assigning the words
to a class based on the first letter in the word.
Smooth the bigram distribution using these equivalence classes.