This page contains LDM phonetic speech recognition experiment to evaluate the potential of LDM for speech classification.

Setup:
1) 13-dimensional Y[] and 13-dimensional X[]
2) features: 12MFCC + Energy
3) database: 2-speaker [tm] and [ss]. 3-sound /aa/, /m/, and /sh/ with 9 examples for each sound.

Experiment steps:
(1) train model /aa/ using 14 examples: tm_aa1 ... tm_aa7, ss_aa1 ... ss_aa7.
(2) train model /m/ using 14 examples: tm_m1 ... tm_m7, ss_m1 ... ss_m7.
(3) train model /sh/ using 14 examples: tm_sh1 ... tm_sh7, ss_sh1 ... ss_sh7.
(4) test sound /aa/ using 4 examples: tm_aa8, tm_aa9, ss_aa8, ss_aa9.
(5) test sound /m/ using 4 examples: tm_m8, tm_m9, ss_m8, ss_m9.
(6) test sound /sh/ using 4 examples: tm_sh8, tm_sh9, ss_sh8, ss_sh9.

EM training 50 iteration /sh/

Confusion Matrix

Result and Analysis

I tried both traditional EM and our new schocastic EM taining and they gave very similar results. Until now the schocastic EM taining works just fine.
It can classify well between /aa/ and /m/, /sh/ and /m/. However, it is easy to get confused between /aa/ and /sh/. This is the major problem for this experiment. Sundar and I are working on this now.
As Dr. Picone suggested, now Sundar and I are recording the rest 40 examples for each sound and I am going to run the experiment again using 50 example each sound.

--
August 21, 2007 by Tao.