in HTK regression class trees are built using a centroid splitting algorithm that yields clusters that lie in a similar portion of the acoustic space:
- select a terminal node that is to be split
- calculate the mean and variance from the mixture components clustered at this node
- create two children. initialize their means to the parent mean perturbed in opposite directions (for each child) by a fraction of the variance
- for each component at the parent node assign the component to one of the children by using a Euclidean distance measure to ascertain which child mean the component is closest to
- once all the components have been assigned, calculate the new means for the children, based on the component assignments
- keep re-assigning components to the children and re-estimating the child means until there is no change in assignments from one iteration to the next
- finalize the split
- repeat this until the desired number of child nodes is found