INFORMATION THEORY BASED DECISION TREES FOR DATA CLASSIFICATION

DISCUSSION

Scenic beauty estimation problem
- Both C4.5 and Bayesian decision tree systems perform better than PCA system on training data but worse on test data
- C4.5 produces 0% error rates on all closed loop tests but performs worse on open loop tests, which suggests the system might be memorizing the training data
- C4.5 seems to produce more errors when classifying data with continuous attributes
Surname pronunciation problem
- C4.5 does not perform as well as the Bayesian approaches on small data sets
- The results produced by the C4.5 trees are comparable to the Bayesian trees
- Further investigation on different pruning and smoothing methods might prevent the decision trees from overfitting data