Digital Speech Production Models Recall our concatenated lossless tube model: We can approximate this as a digital filter using the sampling theorem: The transfer function of an N-tube model is: where We can compute recursively: Alternate Digital Filter Implementations Using Digital Resonators Note that for to have real coefficients, zeros must occur in complex conjugate pairs. We can transform zeros in the Laplace domain: The corresponding complex conjugate poles in the discrete-domain are: Note that magnitude of the pole in the -plane is related to the bandwidth. We can write a transfer function as a product of these poles: where This is an all-pole filter. It can be realized using a number of structures: Under what conditions is this filter stable? where, Excitation Models How do we couple energy into the vocal tract? The glottal impedance can be approximated by: The boundary condition for the volume velocity is: For voiced sounds, the glottal volume velocity looks something like this: time (ms) The Complete Digital Model (Vocoder) Impulse Train Generator Glottal Pulse Model Random Noise Generator Vocal Tract Model V(z) Generator Lip Radiation Model Notes: · Sample frequency is typically 8 kHz to 16 kHz · Frame duration is typically 10 msec to 20 msec · Window duration is typically 30 msec · Fundamental frequency ranges from 50 Hz to 500 Hz · Three resonant frequencies are usually found within 4 kHz bandwidth · Some sounds, such as sibilants ("s") have extremely high bandwidths Questions: What does the overall spectrum look like? What happened to the nasal cavity? What is the form of V(z)? Linear Prediction How do we estimate the vocal tract parameters? Recall our digital filter model: This corresponds to a finite difference equation of the form: We predict the current value, , based on its previous values and the new input value - this is known as linear prediction. We can define the energy of the prediction error as: where is the predicted value. We can derive an equation for the computation of by minimizing the mean-square error (differentiate the energy of the error w.r.t. and solve for ). This yields: where: and, . Relationship to the Lattice Filters and Reflection Coefficients The standard direct-form FIR filter can be implemented in a lattice structure: The inverse, or Infinite Impulse Response (IIR) equivalent, is an all-pole filter: The coefficients are called reflection coefficients, and can be computed directly from the signal: For the filter to be stable, these reflection coefficients must be bounded: . Transformations Between Parameters The predictor coefficients, reflection coefficients, and area ratios represent alternate descriptions of the same information: Predictor to reflection coefficient transformation: Reflection to predictor coefficient transformation: Durbin Recursion: an efficient algorithm to solve linear equations involving symmetric matrices): Log of the ratio of the areas of adjacent sections of a lossless tube: Fundamental Frequency Analysis How do we determine the fundamental frequency? We use the (statistical) autocorrelation function: Other common representations: Average Magnitude Difference Function (AMDF): Zero Crossing Rate: