MEL-FREQUENCY CEPSTRUM
Recall our filterbank, which we construct in mel-frequency domain
using a triangularly-shaped weighting function applied to
mel-transformed log-magnitude spectral samples:
After computing the DFT, and the log magnitude spectrum (to obtain the
real cepstrum), we compute the filterbank outputs, and then
use a discrete cosine transform:
to compute the mel-frequency cepstrum coefficients. Note that
the triangular weighting functions are applied directly to the
magnitude spectrum, and then the logarithm is taken after the spectral
samples are averaged. The resulting coefficients are an approximation
to the the cepstrum, and in reality simply represent an orthogonal and
compact representation of the log magnitude spectrum.
We typically use
24 filterbank samples
at an 8 kHz sampling frequency, and truncate the DCT
to 12 MFCC coefficients. Adding energy gives us a total of
13 coefficients
for our base feature vector.