HW 12: N-GRAM LANGUAGE MODELING

  1. Download the Switchboard transcriptions and replicate the plots of N-gram frequencies shown in page 2 of lecture 32.

  2. Consider all possible N-grams that could have occurred assuming any word in the unigram distribution could follow any other word in this distribution. Add one to every count (a simple form of smoothing). Compute the entropy of each distribution and compare this to the total numbers of N-grams in the distribution. Interpret these results.