Linear Prediction
Let us define a speech signal, . Consider the problem of predicting the current value from the previous value, . This prediction will be in error by some amount:

We would like to minimize the error by finding the best, or optimal, value of . Let us define the short-time average prediction error:

We can minimize the error w.r.t  by differentiating  and setting the result equal to zero:

Differentiating w.r.t ,

or,

which implies:

Notes:
·	related to the correlation structure of the signal ()
·	independent of the energy level of the signal

What short-term technique do we use to compute correlation/covariance?
Linear Prediction (Again, But More General)
Let us define a speech signal, , and a predicted value: . Why the added terms? This prediction error is given by:

We would like to minimize the error by finding the best, or optimal, value of . Let us define the short-time average prediction error:

We can minimize the error w.r.t  for each  by differentiating  and setting the result equal to zero:

Linear Prediction (Cont.)
Rearranging terms:

or,

This equation is known and the linear prediction (Yule-Walker) equation.  are known as linear prediction coefficients, or predictor coefficients.
By enumerating the equations for each value of , we can express this in matrix form:

where,

The solution to this equation involves a matrix inversion:

and is known as the covariance method. Under what conditions does  exist?

Note that the covariance matrix is symmetric. A fast algorithm to find the solution to this equation is known as the Cholesky decomposition (a  approach in which the covariance matrix is factored into lower and upper triangular matrices).
The Autocorrelation Method
Using a different interpretation of the limits on the error minimization - forcing data only within the frame to be used - we can compute the solution to the linear prediction equation using the autocorrelation method:
 
where,

Note that  is symmetric, and all of the elements along the diagonal are equal, which means (1) an inverse always exists; (2) the roots are in the left-half plane.

The linear prediction process can be viewed as a filter by noting:

and

where

 is called the analyzer; what type of filter is it? (pole/zero? phase?)
 is called the synthesizer; under what conditions is it stable?

Linear Prediction Error
We can return to our expression for the error:

and substitute our expression for  and show that:

Autocorrelation Method:

Covariance Method:


Later, we will discuss the properties of these equations as they relate to the magnitude of the error. For now, note that the same linear prediction equation that applied to the signal applies to the autocorrelation function, except that samples are replaced by the autocorrelation lag (and hence the delay term is replaced by a lag index).
Since the same coefficients satisfy both equations, this confirms our hypothesis that this is a model of the minimum-phase version of the input signal.
Linear prediction has numerous formulations including the covariance method, autocorrelation formulation, lattice method, inverse filter formulation, spectral estimation formulation, maximum likelihood formulation, and inner product formulation. Discussions are found in disciplines ranging from system identification, econometrics, signal processing, probability, statistical mechanics, and operations research.