HW 14: SEARCH ALGORITHMS

  1. As a continuation of homework assignment no. 12, consider the unigram distribution computed from the SWB data. Implement Good-Turing smoothing and compare this result to simple smoothing using entropy.

  2. Smooth the bigram distribution using Katz smoothing, and compare this result to the unsmoothed distribution.

  3. Group all uppercase words into one equivalence class; sort all other words into 26 equivalence classes by assigning the words to a class based on the first letter in the word. Smooth the bigram distribution using these equivalence classes.