SPLITTING CRITERIA

To split data at a node, we need to find the question that results in the greatest entropy reduction (removes uncertainty in the data):


In speech recognition, we can show this amounts to maximizing the increase in likelihood:
dL = L(parent) - L(left child) - L(right child)

These likelihoods can be computed from the state occupancies computed during training (see decision tree-based state tying for a detailed derivation and the important references).