OVERSAMPLING IMPROVES PERFORMANCE

The spectrum is oversampled to avoid biased estimates and to reduce variation in the measurements due to quantization of the frequency scale (for example, formants with narrow bandwidths):
For example, consider the parameters of a typical front end:

sample frequency = 8 kHz
frame duration = 10 msec
window duration = 25 msec (200 points)
FFT length = 256 points
max frequency = sample frequency / 2 = 4 kHz
max mel frequency = max frequency in mel = 2146.1 mel
number of mel frequency scale bins = 24 bins
mel frequency resolution = max mel frequency / (24 + 1) = 85.84 mel
center frequency = i * 85.84 mel

This approach generates the table shown below:

Bin # Continuous Frequency Discrete Frequency
Start
(Hz/Mel)
Center
(Hz/Mel)
Stop
(Hz/Mel)
Range
(Index)
1 0.0
0.0
55.4
85.8
115.2
171.7
0 - 3
2 55.4
85.8
115.2
171.7
179.7
257.5
2 - 5
3 115.2
171.7
179.7
257.5
249.3
343.4
4 - 7
4 179.7
257.5
249.3
343.4
324.5
429.2
6 - 10
5 249.3
343.4
324.5
429.2
405.5
515.1
8 - 12
6 324.5
429.2
405.5
515.1
493.0
600.9
11 - 15
7 405.5
515.1
493.0
600.9
587.5
686.7
13 - 18
8 493.0
600.9
587.5
686.7
689.4
772.6
16 - 22
9 587.5
686.7
689.4
772.6
799.3
858.4
19 - 25
10 689.4
772.6
799.3
858.4
918.0
944.3
23 - 29
11 799.3
858.4
918.0
944.3
1046.1
1030.1
26 - 33
12 918.0
944.3
1046.1
1030.1
1184.2
1116.0
30 - 37
13 1046.1
1030.1
1184.2
1116.0
1333.4
1201.8
34 - 42
14 1184.2
1116.0
1333.4
1201.8
1494.3
1287.6
38 - 47
15 1333.4
1201.8
1494.3
1287.6
1668.0
1373.5
43 - 53
16 1494.3
1287.6
1668.0
1373.5
1855.4
1459.3
48 - 59
17 1668.0
1373.5
1855.4
1459.3
2057.6
1545.2
54 - 65
18 1855.4
1459.3
2057.6
1545.2
2275.9
1631.0
60 - 72
19 2057.6
1545.2
2275.9
1631.0
2511.4
1716.9
66 - 80
20 2275.9
1631.0
2511.4
1716.9
2765.6
1802.7
73 - 88
21 2511.4
1716.9
2765.6
1802.7
3039.9
1888.5
81 - 97
22 2765.6
1802.7
3039.9
1888.5
3335.9
1974.4
89 - 106
23 3039.9
1888.5
3335.9
1974.4
3655.3
2060.2
98 - 116
24 3335.9
1974.4
3655.3
2060.2
4000.0
2146.1
107 - 127

Finally, these 24 points are used to compute a forward DCT (extended to be a 48-point periodic and even sequence). The first 12 coefficients are retained.

The forward DCT is used because of its energy compaction property (a property shared by many orthogonal transforms). This transform allows us to approximate the data with fewer coefficients, since the coefficients are more concentrated at lower indices. Hence, we truncate the representation to 12 coefficients and retain most of the important information, as well as ensure that the coefficients are orthogonal to one another.