EE 8993: Speech Recognition Homework Assignment #5 Linear Prediction May 4, 1998 submitted to: Dr. Joseph Picone Department of Electrical and Computer Engineering 413 Simrall, Hardy Rd. Mississippi State University Box 9 571 MS State, MS 39762 submitted by: Julie Ngan Department of Electrical and Computer Engineering Mississippi State University Box 9571 Mississippi State, Mississippi 39762 Tel: 601-325-8335 Fax: 601-325-3149 email: ngan@isip.msstate.edu I. Problem Definition Implement a capability to plot a signal's FFT spectrum, and the gain-matched spectrum produced by a linear prediction model. The tool must read speech from a binary file, and allow user to select the sampling frequency of the signal, the pre-emphasis constant, window duration in seconds, center time for the window in second, a rectangular or hamming window, and the linear prediction order. The output of the tool should be a plot of the signal spectrum and the corresponding linear prediction model computed which can be plotted on a log amplitude vs. linear frequency scale. II. Overview Linear prediction model is based on the fact that a signal conveying messages is never completely random. There is a correlation between successive samples. Therefore, linear prediction is the modeling of a signal as a linear combination of its past values and the present and past values of a hypothetical input to a system whose output is the given signal []. The objective of an LP analysis is then to estimate parameters of an all-pole model of the speech data. Define a speech signal , and a predicted value , then the prediction error is given by: (1). Then the total squared error is: (2). Therefore, we can minimize the value of the total errors by differentiating equation (2) with respect to , which are known as the linear prediction coefficients. III. Linear Prediction Calculation The whole linear prediction system is a block processing model in which a window of samples is processed and a vector of features is computed. Using the given window duration and window center, a number of consecutive speech samples are read from the speech file. The window is de-biased by subtracting the average sample value from each sample: (3) where . The signal is pre-emphasized or processed by a first-order digital network in order to spectrally flatten the signal using: (4) where . Then depending on the user's choice, either a Hamming window or a rectangular window is applied to the signal to minimize the adverse effects of chopping the sample section out of the continuous speech signal. Hamming window is defined as: (5) A rectangular window retains the value of the window while zeros out all the values outside the window. Using the number of samples in the window, , an -point DFT is applied to the window to plot the spectrum: (6) where . Then the window of samples is autocorrelated to give a set of coefficients, where is the order of the desired LPC analysis: (7). A vector of LPC coefficients is computed from the autocorrelation vector using a Levinson-Durbin recursion method. This recursion method also gives the intermediate variables, which are the reflection coefficients. Durbin's algorithm gives the gain as: (8) The derived predictor coefficients are then used to compute the spectrum. IV. The LPC Program The LPC program written makes use of the basic code for homework 4. The program reads in a binary speech file and a parameter file. User can also specify parameters such as the number of channels, the window to use (rectangular or hamming), the sampling frequency, the center time of the window, the pre-emphasis constant, and the LP order over the command line. The program generates two sets of values on a log amplitude versus linear frequency scale. The two sets of values are the original input signal computed using a 256-order DFT and the linear predicted signal. The two signals can then be plotted via xmgr. V. Experimental Results The LPC program is tested on audio data (710_b_8k.raw) used in homework 4. The goal of this experiment is to compare the spectrum obtained from different LP order while keeping the DFT spectrum constant. The file is plotted using a rectangular window of 28 msec with the window centered at 20 msec results using an LP order of 2, 4, 8, 16, 32, 64, 128, 256, and 512 versus a 256-order DFT spectrum. The results are shown in Figure 1 - 9. As we can see from the spectrums, using LP orders of 2, 4, 8, or 16 results in a very smooth line which does not represent the original signal very well. The spectrum failed to register the two high amplitude spikes at 0 Hz and the 8,000 Hz frequency or the two low amplitude spikes at around 2,200 Hz and 5,700 Hz. As we increase the LP order to 32, or 64, the linear predicted signals are giving very close estimation of the DFT spectrum. With the use of an even higher LP order (128, 256, and 512), we observe that the linear predicted spectra are almost exactly the same as that of the DFT spectrum. Therefore, we conclude that the higher LP order we use, the better estimation will result from the linear predicted spectrum. VI. REFERENCES [1] J. Makoul, "Linear Prediction: A Tutorial Review," Proc. IEEE, vol. 63, pp. 561-580, 1975. Figure 1 The 2nd order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 2 The 4th order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 3 The 8th order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 4 The 16th order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 5 The 32nd order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 6 The 64th order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 7 The 128th order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 8 The 256th order LP spectrum (red) vs. the 256 point DFT spectrum (black). Figure 9 The 512th order LP spectrum (red) vs. the 256 point DFT spectrum (black).