Digital Speech Production Models

Recall our concatenated lossless tube model:


We can approximate this as a digital filter using the sampling theorem:


The transfer function of an N-tube model is:

where

We can compute  recursively:

Alternate Digital Filter Implementations Using Digital Resonators
Note that for  to have real coefficients, zeros must occur in complex conjugate pairs. We can transform zeros in the Laplace domain:

The corresponding complex conjugate poles in the discrete-domain are:

Note that magnitude of the pole in the -plane is related to the bandwidth.
We can write a transfer function as a product of these poles:

where

This is an all-pole filter. It can be realized using a number of structures:
Under what conditions is this filter stable?
where,

Excitation Models
How do we couple energy into the vocal tract?
The glottal impedance can be approximated by:

The boundary condition for the volume velocity is:

For voiced sounds, the glottal volume velocity looks something like this:
time (ms)
The Complete Digital Model (Vocoder)
Impulse
Train
Generator
Glottal
Pulse
Model
Random
Noise
Generator
Vocal Tract
Model
V(z)
Generator
Lip
Radiation
Model
Notes:
· Sample frequency is typically 8 kHz to 16 kHz
· Frame duration is typically 10 msec to 20 msec
· Window duration is typically 30 msec
· Fundamental frequency ranges from 50 Hz to 500 Hz
· Three resonant frequencies are usually found within 4 kHz bandwidth
· Some sounds, such as sibilants ("s") have extremely high bandwidths

Questions:
What does the overall spectrum look like?
What happened to the nasal cavity?
What is the form of V(z)?
Linear Prediction
How do we estimate the vocal tract parameters?
Recall our digital filter model:

This corresponds to a finite difference equation of the form:

We predict the current value, , based on its previous values and the new input value - this is known as linear prediction.
We can define the energy of the prediction error as:

where  is the predicted value. We can derive an equation for the computation of  by minimizing the mean-square error (differentiate the energy of the error w.r.t.  and solve for ). This yields:

where:

and,
.
Relationship to the Lattice Filters and Reflection Coefficients
The standard direct-form FIR filter can be implemented in a lattice structure:

The inverse, or Infinite Impulse Response (IIR) equivalent, is an all-pole filter:


The coefficients  are called reflection coefficients, and can be computed directly from the signal:

For the filter to be stable, these reflection coefficients must be bounded: .
Transformations Between Parameters
The predictor coefficients, reflection coefficients, and area ratios represent alternate descriptions of the same information:

Predictor to reflection coefficient transformation:


Reflection to predictor coefficient transformation:


Durbin Recursion:	an efficient algorithm to solve linear equations involving symmetric matrices):


Log of the ratio of the areas of adjacent sections of a lossless tube:

Fundamental Frequency Analysis
How do we determine the fundamental frequency?
We use the (statistical) autocorrelation function:


Other common representations:
Average Magnitude Difference Function (AMDF):

Zero Crossing Rate: