homework #5 Linear Prediction Analysis EE 8993: Fundamentals of Speech Recognition March 11, 1999 submitted to: Dr. Joseph Picone submitted by: Suresh Balakrishnama Institute for Signal and Information Processing Department of Electrical and Computer Engineering Mississippi State University MS 39762, USA Email: balakris@isip.msstate.edu 1. INTRODUCTION Linear Prediction Analysis has been among the most popular methods for extracting spectral information from speech. Linear Prediction analysis is an important method for finding the shape of a spectrum. In linear prediction the signal is modeled as a linear combination of the its past values and present and past values of a hypothetical input to a system whose output is the given signal. Each continuous-time signal is sampled to obtain a discrete-time signal ,also known as time-series, where n is an integer variable and is the sampling interval. 2. Problem Description Implement a capability to plot a signal's FFT spectrum, and the gain-matched spectrum produced by a linear prediction model. The tool must read speech from a binary file (assume 16 bit linear sampling), and allow the user to select the following: õõ ëëëëëõëëëëë sample frequency of the signal õpreemphasis constant window duration in secs center time for the window is secs a rectangular or hamming window the linear prediction order You can approach this problem one of two ways implement the signal processing in matlab and figure out how to manipulate binary files i nto matlab implement everything in C++ (preferred) In the latter case, the interface should be something like this: my_prog 8000.0 0.95 0.03 28.7 1 10 foo.raw | xmgr -source stdin The net result should be a plot of the signal spectrum computed using the following parameters: fs = 8 kHz preemphasis = argv[1] window_duration = argv[2] center time of the window = argv[3] hamming window = yes lp_order = argv[4] and plotted on a log amplitude vs. linear frequency scale. The spectra of the corresponding linear prediction model should be plotted as well. Xmgr accepts multiple sets of data, so simply print your xy points for both plots to stdout, with the second set separated by a newline, and xmgr will take care of the rest. You can use a DFT to compute the spectrum of the signal, or a zero-stuffed fft. The important thing is to only use window_duration number of samples of real data (note that window_duration is specified in secs). For example, my_prog.exe 8000.0 0.95 0.03 3.0 12 | xmgr -source stdin should produce a signal and lp model spectrum for a 30 msec window of the signal centered at 3 secs. The lp analysis will be of order 12. A preemphasis filter 1 - 0.95z** is applied to the data. For most of you, this should be a useful tool to have around. Feel free to pull the LP analysis software of the net. The main thing is to get the visualization component working - and to understand gain matching of the two spectra. The resulting plots will typically have about a 60 dB dynamic range for studio quality data. 3. Description of Algorithms One of the most powerful models currently in use is that where a signal is considered to be the output of some system with some unknown input such that the following relation holds: (1) where and , , , , and the gain are the parameters of the hypothesized system. The output from equation (1) is a linear function of past outputs and present and past inputs. The signal is predictable from linear combinations of past outputs and inputs. This is the reason for this system to be called linear prediction. The predicted value is a linear combination of previous values in the signal. Linear prediction error is an important term and the parameters chosen in LP analysis to determine prediction coefficients should be such as to minimize linear prediction error. For a speech signal , predicted values is given by (2) and the prediction error is given by (3) According to Parseval's theorem, if error is small in time domain error is small in frequency domain also and this error should be minimized to the least. The error can be minimized by finding the best or optimal value of . To explain the computation involved for let us consider a short-time prediction error: (4) (5) The error can be minimized with respect to for each by differentiating and setting the result equal to zero. (6) Rearranging terms we get, (7) Equation (7) is known as linear prediction equation and are known as linear prediction coefficients or predictor coefficients. Levinson-Durbin's Recursion Method The L-D recursion is a recursive-in-model-order solution for the autocorrelation equations. The solution for desired order-M model is successively built from lower-order models, beginning with the 0th order predictor which is no predictor coefficient at all. This method uses autocorrelation coefficients to determine the prediction coefficients and reflection coefficients. The prediction coefficients can be computed using the following equations: (8) with and where indicates the current iteration, indicates the previous iteration, is the total number of iterations, and is the order of the prediction. is the error term, is the autocorrelation coefficient, is the reflection coefficient, and is the predictor coefficient. DFT Method Like the previous case, there are many algorithms to calculate the DFT coefficients. However, here since we focus on the linear prediction task and not the DFT, we did not use any fast implementation of the DFT calculation but just straight implementation from the DFT equation, Equation (9). (9) with where indicates the current iteration, is the total number of iterations, and is the order of the DFT. 4. Results Figure 1. Plot showing DFT spectrum of speech file 5. Conclusions The DFT spectrum was obtainable but the computation of LP derived spectrum became difficult and its plot could not be obtained demonstrating the effect of LP derived spectrum over DFT spectrum. The error between the LP-derived and DFT spectrum could not be analyzed. But based on theory, the error becomes smaller as the LP model order gets higher and higher. 6. References [1] J.Makhoul, "Linear Prediction: A Tutorial Review", Proceedings of the IEEE, Vol. 63, April 1975. [2] J.Picone, "ECE 8993: Fundamentals of Speech Recognition Lecture Notes", Mississippi State University, May 1998.