RE: [S] Multiple events

Thomas J. Downing (
Mon, 13 Jul 1998 11:46:38 -0700 (PDT)


We are using Classification and Regression Trees to model
a 2 level factor response on 4 continuous predictors. Our initial
"training set" contains approximately 65,000 observations.
Out-of-sample testing using unadulterated classification trees
yields impressive reults. However, when we model using random
samples ( 30 - 50 % ) of the training set, out-of-sample results
are often very different. (Sampling is necessary because
observations grow quickly as out-of-sample data enters the
training space).

Little attempt has been made at simplifying the trees, since the
trees and corresponding out-of-sample results seem to be determined
primarily by the sample that is drawn

The relationship between predictors and response is expected
to be weak; OLS regressions yield R-squareds in neighborhood of
.02. We do not expect that our sampling techniques are biased.

Is the non-conformity between our results simply a function of
the weak underlying relationship ? Are there any parameters we
can vary or techniques we can use (short of increasing our sample
size) which will increase the similarity between the Full Sets and the
samples of those sets?

We are using S-PLUS 4.0 for Windows


Thomas J. Downing
Research Assistant
Quantitative Research
Value Line, Inc.

Get your free address at

This message was distributed by To unsubscribe
send e-mail to with the BODY of the
message: unsubscribe s-news