[S] tech report on boosting

tibs@utstat.toronto.edu
Thu, 23 Jul 98 21:47 EDT


*** Technical Report Available ***

Additive Logistic Regression: a Statistical View of Boosting

Jerome Friedman
(jhf@stat.stanford.edu)

Trevor Hastie
(trevor@stat.stanford.edu)

Robert Tibshirani
(tibs@utstat.toronto.edu)

ABSTRACT

Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of
the most important recent developments in classification
methodology. The performance of many classification algorithms often
can be dramatically improved by sequentially applying them to
reweighted versions of the input data, and taking a weighted majority
vote of the sequence of classifiers thereby produced. We show that
this seemingly mysterious phenomenon can be understood in terms of
well known statistical principles, namely additive modeling and
maximum likelihood. For the two-class problem, boosting can be viewed
as an approximation to additive modeling on the logistic scale using
maximum Bernoulli likelihood as a criterion. We develop more direct
approximations and show that they exhibit nearly identical results to
that of boosting. Direct multi-class generalizations based on
multinomial likelihood are derived that exhibit performance comparable
to other recently proposed multi-class generalizations of boosting in
most situations, and far superior in some. We suggest a minor
modification to boosting that can reduce computation, often by factors
of 10 to 50. Finally, we apply these insights to produce an
alternative formulation of boosting decision trees. This approach,
based on best-first truncated tree induction, often leads to better
performance, and can provide interpretable descriptions of the
aggregate decision rule. It is also much faster computationally making
it more suitable to large scale data mining applications.

Available by ftp from:

ftp://stat.stanford.edu/pub/friedman/boost.ps.Z

or in

www://utstat.toronto.edu/tibs/research.html

or

ftp://utstat.toronto.edu/pub/tibs/boost.ps.Z

Comments welcome.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Rob Tibshirani, Dept of Public Health Sciences, and Dept of Statistics
Univ of Toronto, Toronto, Canada M5S 1A8.
Phone: 416-978-4642 (PMB), 416-978-0673 (stats). FAX: 416 978-8299
computer fax 416-978-1525 (please call or email me to inform)
tibs@utstat.toronto.edu. ftp: //utstat.toronto.edu/pub/tibs
http://www.utstat.toronto.edu/~tibs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Another turning point, a fork stuck in the road.
Time grabs you by the wrist, directs you where to go.
So make the best of this test, and don't ask why.
It's not a question, but a lesson learned in time.
It's something unpredictable, but in the end is right.
I hope you had the time of your life.

So take the photographs, and still frames in your mind.
Hang it on shelf of good health and good time.
Tattoos of memories and dead skin on trial.
For what it's worth, it was worth all the while.
I hope you had the time of your life.

Green Day
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news