"Distance" between two tokens i, and j = d[i,j] + d[j,i], where d[i,j] is obtained by linearly stretching smaller duration token to longer one (ideally, the stretch should be computed by using dynamic programming). Any mismatch in state label incurs a penalty of 2, a mismatch of gaussian for the same state incurs a penalty of 1.
Two out of the four clusters for "__they"
1st cluster: probability of occurrence in tokens = 0.19, duration = 20 frames.
{1,4} -> {7} -> {7,2,5} -> {1,8,6} -> {7,7,4,4,3,3,3,3} -> {3,3,5}
2nd cluster: probability = 0.27, duration = 12 frames.
{6,6,2,5,8} -> {1,1} -> {5,5} -> {2} -> {5,5}
Histogram of durations
HMM topology used for recognition