LECTURE 27: DECISION TREES

THE CART ALGORITHM
The classification and regression tree (CART) algorithm can be summarized as follows:

Create a set of questions that consists of all possible questions about the measured variables (phonetic context).
Select a splitting criterion (likelihood).
Initialization: create a tree with one node containing all the training data.
Splitting: find the best question for splitting each terminal node. Split the one terminal node that results in the greatest increase in the likelihood.
Stopping: if each leaf node contains data samples from the same class, or some pre-set threshold is not satisfied, stop. Otherwise, continue splitting.
Pruning: use an independent test set or cross-validation to prune the tree.