An example of source-filter separation using voiced speech:
(a) Windowed Signal
(b) Log Spectrum
(c) Filtered Cepstrum (n < N)
(d) Smoothed Log Spectrum
(e) Excitation Signal
(f) Log Spectrum (high freq.)
An example of source-filter separation using unvoiced speech:
(a) Windowed Signal
(b) Log Spectrum
(c) Filtered Cepstrum (n < N)
(d) Smoothed Log Spectrum
The reason this works is simple: the fundamental frequency for the
speaker produces a peak in the cepstrum sequence that is far removed
(n > N) from the influence of the vocal tract (n < N). You can
also demonstrate this using an autocorrelation function.
What happens for an extremely high-pitched female or child?