homework #4 Signal to Noise Ratio EE 8993: Fundamentals of Speech Recognition March 11, 1999 submitted to: Dr. Joseph Picone submitted by: Suresh Balakrishnama Institute for Signal and Information Processing Department of Electrical and Computer Engineering Mississippi State University MS 39762, USA Email: balakris@isip.msstate.edu 1. INTRODUCTION Signal to noise ratio (SNR) is the most widely used measure for analog and waveform coding systems and also useful for assessing enhancement algorithms for broadband noise distortions. SNR measurements are only appropriate for coding or enhancement system that seek to reproduce the original input waveform. 2. Problem Description Implement the algorithm described in class to compute the signal to noise ratio using a histogram of the energy distribution. Validate this design by: 1. Processing the four files below: ece_8993_speech/homework/1996/data/710_b_8k.raw ece_8993_speech/homework/1996/data/710_s_8k.raw ece_8993_speech/homework/1996/data/711_g_8k.raw ece_8993_speech/homework/1996/data/712_f_8k.raw and comparing your answers to the results from the class of 1996. First, plot the average SNR of the four files for the following conditions (do a scatter plot): - frame duration of 5, 10, 20, and 40 msec - window duration of 10, 20, 30, 60 msec Use a signal threshold of 80% and a noise threshold of 20%. Next, for the best set of parameters above, plot the average SNR as a function of the thresholds: - signal threshold 80%, 85%, 90%, 95%; - noise threshold 10%, 15%, 20%, 25% 2. Processing a large chunk of Switchboard: /isip/d02/switchboard/data/sw2151_ec.raw 3. Implementation The main objective here is to process each frame of data so that you cover all the samples when distributing each frame. Each frame of the speech signals for all channels is pre-emphasized using: (1) where in our signal to noise calculation. To pre-emphasize a signal means to apply a low pass filter that would increase the relative energy of the high-frequency spectrum. The energy of noise increases in proportional to the square of the channel frequency, by introducing a low pass filter, we would be able to get a more accurate signal to noise ratio. Furthermore, the use of pre-emphasis can eliminate the spectral contributions of the larynx and lips for analysis to seek parameters corresponding to the vocal tract only [1]. Then Hamming window is applied to the signal: (2) This is used to smooth the abrupt discontinuity at the window boundaries. The energy is computed using: (3). The energy for each frame is stored until all the signals are processed. Then a probability density function (pdf) of the energy values are calculated. A total number of 10,000 bins are used to plot the energy histogram. 4. Results frame window 10 20 30 60 5 9.134826 8.996714 8.871572 8.542184 10 9.004938 9.030701 8.852352 8.422152 20 9.066294 8.928496 8.853244 8.511646 40 8.977531 9.085185 8.775523 8.622317 Table 1: Average signal to noise ratio (in dB) using different window and frame duration for the file 710_b_8k.raw frame window 10 20 30 60 5 9.134826 8.996714 8.871572 8.542184 10 9.004939 9.030701 8.852352 8.422152 20 9.066294 8.928496 8.853244 8.511646 40 8.977531 9.085185 8.775523 8.910591 Table 2: Average signal to noise ratio using different window and frame duration for the file 710_s_8k.raw frame window 10 20 30 60 5 9.355579 9.332850 9.317355 9.018757 10 9.380290 9.439940 9.363338 9.009154 20 9.314431 9.166645 9.301335 9.024555 40 9.632618 9.323182 9.099382 8.913860 Table 3: Average signal to noise ratio using different window and frame duration for the file 710_g_8k.raw frame window 10 20 30 60 5 9.617392 9.623659 9.514186 9.231970 10 9.677206 9.601563 9.580981 9.207649 20 9.745012 9.526969 9.626323 9.258477 40 9.632618 9.820978 9.419830 9.233452 Table 4: Average signal to noise ratio using different window and frame duration for the file 712_f_8k.raws noise threshold(%) signal threshold(%) 80 85 90 95 10 10.740457 10.740457 10.740457 10.740457 15 9.847071 9.847071 9.847071 9.847071 20 9.134826 9.134826 9.134826 9.134826 25 8.082599 8.082599 8.082599 8.082599 Table 5: Average signal to noise ratio (in dB) using different signal and noise threshold for the file 710_b_8k.raw noise threshold(%) signal threshold(%) 80 85 90 95 10 10.740457 10.740457 10.740457 10.740457 15 9.847071 9.847071 9.847071 9.847071 20 9.134826 9.134826 9.134826 9.134826 25 8.082599 8.082599 8.082599 8.082599 Table 6: Average signal to noise ratio (in dB) using different signal and noise threshold for the file 710_s_8k.raw noise threshold(%) signal threshold(%) 80 85 90 95 10 10.778791 10.778791 10.778791 10.778791 15 10.140963 10.140963 10.140963 10.140963 20 9.632618 9.632618 9.632618 9.632618 25 8.693860 8.693860 8.693860 8.693860 Table 7: Average signal to noise ratio(in dB) using different signal and noise threshold for the file 710_g_8k.raw noise threshold(%) signal threshold(%) 80 85 90 95 10 10.657279 10.657279 10.657279 10.657279 15 10.188071 10.188071 10.188071 10.188071 20 8.997402 8.997402 8.997402 8.997402 25 8.997402 8.997402 8.997402 8.997402 Table 8: Average signal to noise ratio (in dB) using different signal and noise threshold for the file 712_f_8k.raws 5. PLOTS Figure 1. Plot showing average SNR against frame duration(msec) Figure 2. Plot showing average SNR against window duration(msec) Figure 3. Plot showing average SNR against noise threshold(%) Figure 4. Plot showing average SNR against signal threshold(%) 6. CONCLUSIONS We have implemented the algorithm to compute the signal-to-noise ratio of a speech file. For any speech file a low signal-to-noise ratio is desirable.Table 1-8 show the SNR values highlighted corresponding to the optimum window and frame duration for the signal for each of the speech file given in the assignment. In next part, holding the frame and window duration to be constant, the noise and signal threshold are varied for each file. The SNR remains constant when signal threshold is increased whereas SNR decreases when noise threshold is increased. This means that the energy assigned to signal+noise is inversely related to energy of noise which is not an optimum SNR estimator and hence these window duration and frame duration selected do not infer perfect parameters for SNR estimator. 7. SOFTWARE All Matlab code written for this project is available for public from our website at www.isip.msstate.edu 8. REFERENCES [1] F. Jelinek, Statistical Methods for Speech Recognition, The MIT Press, Cambridge, Massachusetts, USA. [2] J.Deller, J.G.Proakis and J.Hansen, "Discrete-time processing of speech signals", Macmillan Publishing Company, New York, USA.