Table 1 The average signal to noise ratios using different window and frame durations for the file 710_b_8k.raw. Table 2 The average signal to noise ratios using different window and frame durations for the file 710_s_8k.raw. Table 3 The average signal to noise ratios using different window and frame durations for the file 711_g_8k.raw. Table 4 The average signal to noise ratios using different window and frame durations for the file 712_f_8k.raw. Table 5 The average signal to noise ratios using different signal and noise thresholds for the file 710_b_8k.raw. Table 6 The average signal to noise ratio using different signal and noise thresholds for the file 710_s_8k.raw. Table 7 The average signal to noise ratio using different signal and noise thresholds for the file 711_g_8k.raw. Table 8 The average signal to noise ratio using different signal and noise thresholds for the file 712_f_8k.raw. Table 9 The average signal to noise ratio using different window and frame durations for the left channel of the switchboard file. Table 10 The average signal to noise ratio using different window and frame durations for the right channel of the switchboard file. Table 11 The average signal to noise ratio using different signal and noise thresholds for the left channel of the switchboard file. Table 12 The average signal to noise ratio using different signal and noise thresholds for the right channel of the switchboard file. EE 8993: Speech Recognition Homework Assignment #4 Signal to Noise Ratio May 3, 1998 submitted to: Dr. Joseph Picone Department of Electrical and Computer Engineering 413 Simrall, Hardy Rd. Mississippi State University Box 9 571 MS State, MS 39762 submitted by: Julie Ngan Department of Electrical and Computer Engineering Mississippi State University Box 9571 Mississippi State, Mississippi 39762 Tel: 601-325-8335 Fax: 601-325-3149 email: ngan@isip.msstate.edu I. Problem Definition Implement the algorithm described in class to compute the signal to noise ratio using a histogram of the energy distribution. Validate this design by: 1. Processing the four files below: ece_8993_speech/homework/1996/data/710_b_8k.raw ece_8993_speech/homework/1996/data/710_s_8k.raw ece_8993_speech/homework/1996/data/711_g_8k.raw ece_8993_speech/homework/1996/data/712_f_8k.raw and comparing your answers to the results from the class of 1996. First, plot the average SNR of the four files for the following conditions (do a scatter plot): - frame duration of 5, 10, 20, and 40 msec - window duration of 10, 20, 30, 60 msec Use a signal threshold of 80% and a noise threshold of 20%. Next, for the best set of parameters above, plot the average SNR as a function of the thresholds: - signal threshold 80%, 85%, 90%, 95%; - noise threshold 10%, 15%, 20%, 25% 2. Processing a large chunk of Switchboard: /isip/d02/switchboard/data/... II. Overview The speech signals for all channels are read from a raw file. Depending on the duration of the window and the frame, we zero padded the beginning and the end of the signal so that we can align the beginning of the first frame with the beginning of the speech signal, while having the window begins at negative time. Note that the frame is always positioned in the center of the window. The frame and the window are pre-emphasized, applied with Hamming window before the energy value of the frame is calculated. Using the energy values, a probability density function (pdf) and a cumulative density function (cdf) are plotted. According to the plots, the signal and noise values are found using the signal and noise thresholds, and the signal to noise ratio of the speech file is computed. III. Signal to Noise Ratio Calculation Each frame of the speech signals for all channels is pre-emphasized using: (1) where in our signal to noise calculation. To pre-emphasize a signal means to apply a low pass filter that would increase the relative energy of the high-frequency spectrum. The energy of noise increases in proportional to the square of the channel frequency, by introducing a low pass filter, we would be able to get a more accurate signal to noise ratio. Furthermore, the use of pre-emphasis can eliminate the spectral contributions of the larynx and lips for analysis to seek parameters corresponding to the vocal tract only [1]. Then Hamming window is applied to the signal: (2) This is used to smooth the abrupt discontinuity at the window boundaries. The energy is computed using: (3). The energy for each frame is stored until all the signals are processed. Then a probability density function (pdf) of the energy values are calculated. A total number of 10,000 bins are used to plot the energy histogram. Figure 1 illustrates the pdf generated using the file 710_b_8k.raw. Using the probability density function, we can compute the cumulative distribution (cdf) of the energy signals, as shown in Figure 2. Because the data is very widespread, the nominal noise level and the nominal signal and noise level are very close together in the plot. Figure 3 provides a better illustration of the calculation of the signal to noise ratio. The signal to noise ratio is then calculated as: (4), where is the nominal signal and noise level and is the nominal noise level. Note that one problem with this method is that we are estimating the nominal signal and noise level and the nominal noise level each using a threshold value. If the two thresholds are set equal, the signal to noise ratio of the signal should be zero. However, in this method, when the two thresholds are equal, the value of becomes 0 and returns a value of as the signal to noise ratio. Therefore, user should be careful not to set the two thresholds too close together. IV. Experimental Results The average SNRs of the four files were found using different frame and window durations and a signal threshold of 80% and a noise threshold of 20% as stated in the problem definition. The results for each of the files are tabulated in Tables 1-4. In order to see the effects of different frame and window durations, a histogram is generated for each speech file to show the signal to noise ratio for the different combinations, as shown in Figures 4-7. Since the values of the signal and noise ratio are very close, the histograms are plotted relative to the smallest value in the group to generate a min subtracted value as the signal to noise ratio. Using the best window and frame durations for each of the file, different signal thresholds (80%, 85%, 90%, and 95%) and noise thresholds (10%, 15%, 20%, and 25%) are applied to the speech files to observe the effects of different thresholds. The results of the experiments are shown in Tables 5-8. For a more clear illustration of the tables above, a histogram is plotted for each file to show the effects of different signal and noise thresholds, as shown in Figures 8-11. From the figures above, it is shown that the signal to noise ratio decreases as the margin between the nominal noise level and the nominal signal and noise level decreases. This is because we are using the same cdf to calculate the signal to noise ratio using different cutoff margins. V. Experiment Using Switchboard Data A telephone conversation which was recorded as the switchboard data collection is used to test on the signal to noise ratio calculation code. This conversation is different from the one we have tested before since it has both left and right channels. Even though the original assignment stated to process only the left channel, both channels are processed and reported. Figures 12 and 13 show the cumulative distributions of the left and the right channels respectively using a frame duration of 5 ms, a window duration of 10 ms, a signal threshold of 80% and a noise threshold of 20%. Tables 9 and 10 show the signal to noise ratio of the two channels of the switchboard data using different frame and window durations. Tables 11 and 12 show the signal to noise ratios of the two channels using the best frame and window durations but with different signal and noise thresholds. VI. REFERENCES [1] K. Fukunaga, "Introduction to Statistical Pattern Recognition," Academic Press, San Diego, California, 1990. Figure 1 The energy histogram of the file 710_b_8k.raw Figure 2 The cumulative distribution of the file 710_b_8k.raw. Figure 3 The calculation of signal to noise ratio. Figure 4 Plot of signal to noise ratio for file 710_b_8k.raw with different window and frame durations. (The number on top of each histogram corresponds to the frame duration in ms.) Figure 5 Plot of signal to noise ratio for file 710_s_8k.raw with different window and frame durations. (The number on top of each histogram corresponds to the frame duration in ms.) Figure 6 Plot of signal to noise ratio for file 711_g_8k.raw with different window and frame durations. (The number on top of each histogram corresponds to the frame duration in ms.) Figure 7 Plot of signal to noise ratio for file 712_f_8k.raw with different window and frame durations. (The number on top of each histogram corresponds to the frame duration in ms.) Figure 8 Plot of signal to noise ratio for file 710_b_8k.raw with different signal and noise thresholds. (The number on top of each histogram corresponds to the noise threshold.) Figure 9 Plot of signal to noise ratio for file 710_s_8k.raw with different signal and noise thresholds. (The number on the top of each histogram corresponds to the noise threshold.) Figure 10 Plot of signal to noise ratio for file 711_g_8k.raw with different signal and noise thresholds. (The number on the top of each histogram corresponds to the noise threshold.) Figure 11 Plot of signal to noise ratio for file 712_f_8k.raw with different signal and noise thresholds. (The number on the top of each histogram corresponds to the noise threshold.) Figure 12 Cumulative distribution of left channel energy for the switchboard data. Figure 13 Cumulative distribution of right channel energy for the switchboard data. noise threshold signal threshold 80 85 90 95 10 17.283 18.464 20.485 22.727 15 16.790 17.973 19.997 22.241 20 16.790 17.973 19.997 22.241 25 15.229 16.421 18.456 20.707 noise threshold signal threshold 80 85 90 95 10 31.109 31.982 33.590 38.425 15 31.109 31.982 33.590 38.425 20 31.109 31.982 33.590 38.425 25 23.731 24.606 26.218 31.058 noise threshold signal threshold 80 85 90 95 10 10.646 11.455 13.019 14.663 15 10.435 11.248 12.815 14.463 20 10.094 10.911 12.487 14.139 25 9.838 10.660 12.241 13.898 noise threshold signal threshold 80 85 90 95 10 11.224 12.525 13.931 15.058 15 10.871 12.179 13.590 14.720 20 10.532 11.847 13.263 14.397 25 10.038 11.363 12.789 13.928 frame window 10 20 30 60 5 16.382 16.721 16.721 16.790 10 16.345 16.192 16.188 16.659 20 / 16.216 16.032 16.450 40 / / / 16.416 frame window 10 20 30 60 5 29.916 30.393 30.393 30.637 10 30.668 30.393 30.324 30.687 20 / 29.972 30.604 30.701 40 / / / 31.109 frame window 10 20 30 60 5 9.150 9.826 9.923 9.878 10 9.417 10.034 10.094 9.921 20 / 9.836 9.972 9.829 40 / / / 9.784 frame window 10 20 30 60 5 10.429 10.532 10.501 10.465 10 10.198 10.396 10.461 10.485 20 / 10.235 10.245 10.439 40 / / / 10.404 frame window 10 20 30 60 5 17.002 18.707 19.667 21.946 10 18.385 19.095 19.605 21.912 20 / 22.493 21.725 22.498 40 / / / 25.067 frame window 10 20 30 60 5 25.105 25.843 25.946 26.323 10 23.142 25.853 26.046 26.367 20 / 25.873 26.077 26.282 40 / / / 26.438 noise threshold signal threshold 80 85 90 95 10 26.438 28.602 30.794 33.549 15 26.438 28.602 30.794 33.549 20 26.438 28.602 30.794 33.549 25 26.438 28.602 30.794 33.549 noise threshold signal threshold 80 85 90 95 10 25.067 28.109 31.523 34.912 15 25.067 28.109 31.523 34.912 20 25.067 28.109 31.523 34.912 25 25.067 28.109 31.523 34.912