The Orthogonality Principle and System Identification
Consider the problem of modeling the output of a system using an all-pole filter and a white noise input:
Let's assume that the system is of the form of an all-pole model:

If the model is sufficiently accurate,  can be assumed to be white Gaussian noise. Suppose we are given measurements of the output, . Can we compute the coefficients, , that minimizes the error between our measurements and the linear prediction estimate of the measurements?

We can simply the above expression by noting that:

Rearranging terms gives:


Unknown
System

Relationship to the Lattice Filters and Reflection Coefficients
The standard direct-form FIR filter can be implemented in a lattice structure:


The inverse, or Infinite Impulse Response (IIR) equivalent, is an all-pole filter:


The coefficients  are called reflection coefficients, and can be computed directly from the signal:

For the filter to be stable, these reflection coefficients must be bounded: .
But, recall that the input to the system is white Gaussian noise, and hence is uncorrelated with the output signal. This implies all correlation terms between the signal and noise are zero. This argument is known as the orthogonality principle (the noise is orthogonal to the signal when the mean square-error predictor is used). Our expression for the error energy simplifies to:

Differentiating w.r.t to  as before gives the normal linear prediction equation:

This view of the linear prediction problem is know as system identification (we are discovering an underlying model of the system using only measurements of the output). We can show that the LP estimate is, on the average, the estimator with the least error (or bias).

What if the system is not an all-pole system? We can approximate zeroes:

It is also possible to postulate an autoregressive moving average model:

Computation of these coefficients is a non-linear optimization problem that is difficult to solve efficiently and reliably. Marginal gains, if any, have been reported.

How does the linear prediction problem relate to more classical applications such as channel equalization (modems)?

What modifications do we need to make to the LP approach?
Levinson-Durbin Recursion
The predictor coefficients can be efficiently computed for the autocorrelation method using the Levinson-Durbin recursion:


This recursion gives us great insight into the linear prediction process. First, we note that the intermediate variables, , are referred to as reflection coefficients.

Example: p=2
	

This reduces the LP problem to  and saves an order of magnitude in computational complexity, and makes this analysis amenable to fixed-point digital signal processors and microprocessors.
Linear Prediction Coefficient Transformations
The predictor coefficients and reflection coefficients can be transformed back and forth with no loss of information:

Predictor to reflection coefficient transformation:


Reflection to predictor coefficient transformation:


Also, note that these recursions require intermediate storage for .

From the above recursions, it is clear that . In fact, there are several important results related to :
(1) 
(2) , implies a harmonic process (poles on the unit circle).
(3)  implies an unstable synthesis filter (poles outside the unit circle).
(4) 

This gives us insight into how to determine the LP order during the calculations. We also see that reflection coefficients are orthogonal in the sense that the best order "p" model is also the first "p" coefficients in the order "p+1" LP model (very important!).
Generalized Lattice Solutions
One of the most famous lattice formulations is the Burg algorithm, originally introduced in the mid-60's prior to the introduction of LP in speech. There are actually a family of lattice solutions of a similar form.

The Burg algorithm requires the coefficients to be computed using:


We see this is a weighted average of the forward and backward error terms, and that the reflection coefficients are FORCED to be bounded.