INFORMATION THEORY BASED DECISION TREES FOR DATA CLASSIFICATION


Julie Ngan
Candidate for Master of Science in Computer Engineering
Institute for Signal and Information Processing
Department of Electrical and Computer Engineering
Mississippi State University
Phone/Fax: (601) 325-8335/3149 Email: ngan@isip.msstate.edu

ABSTRACT

Previously, a Bayesian statistical decision tree approach has shown to provide high performance on two fundamentally different problems: automatic generation of proper noun pronunciations and scenic beauty estimation of forestry images. For the same problems, an information theory based approach is used for decision tree construction. The trees are further pessimistically pruned back using a confidence limit parameter. Using this method, we have achieved the same error rate of 39% on the proper noun problem as the Bayesian decision tree. For the scenic beauty estimation problem, a 44% error rate is achieved, compared to the best reported result of 36%. This presentation will focus on the details of decision tree construction and the use of various parameters in tree pruning. Further, we will also present an analysis of the experimental results, along with the pros and cons in using this information theory based algorithm.