DATA-DRIVEN OPERATION
There are four important operations in constructing a decision tree:
- Question selection: choosing a set of questions to categorize your
data (some algorithms can derive questions automatically).
- Splitting: partitioning data assigned to a node into N groups
(N=2 for binary trees).
- Growing: expanding the tree to better represent the training data.
- Pruning: removing nodes to improve generalization.
In speech recognition, we operate on continuous-valued feature
vectors, and use likelihood computations derived directly from HMM
training. This is a major reason why decision trees are so popular in
speech recognition systems - the implementation is very elegant.