Theory and applications of agnostic PAC-learning with small decision
trees
P. Auer, R. C. Holte, and W. Maass
Abstract:
We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of
decision trees of at most 2 levels, whose computation time is almost linear
in the size of the training set. We evaluate the performance of this learning
algorithmT2 on 15 common real-world datasets, and show that formost of these
datasets T2 provides simple decision trees with little or no loss in
predictive power (compared with C4.5). In fact, for datasets with continuous
attributes its error rate tends to be lower than that of C4.5. To the best of
our knowledge this is the first time that a PAC-learning algorithmis shown to
be applicable to real-world classification problems. Since one can prove that
T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to
optimal 2-level decision trees from sufficiently large training sets for any
(!) distribution of data. In this regard T2 differs strongly fromall other
learning algorithms that are considered in applied machine learning, for
which no guarantee can be given about their performance on new datasets. We
also demonstrate that this algorithm T2 can be used as a diagnostic tool for
the investigation of the expressive limits of 2-level decision trees.
Finally, T2, in combination with new bounds on the VC-dimension of decision
trees of bounded depth that we derive, provides us now for the first time
with the tools necessary for comparing learning curves of decision trees for
real-world datasets with the theoretical estimates of PAClearning theory.
Reference: P. Auer, R. C. Holte, and W. Maass.
Theory and applications of agnostic PAC-learning with small decision trees.
In Proc. of the 12th International Machine Learning Conference, Tahoe
City (USA), pages 21-29. Morgan Kaufmann (San Francisco), 1995.