[S] cross validating regression models

Roy Pardee (roy@u.arizona.edu)
Tue, 10 Mar 1998 14:24:48 -0800


Greetings All,

Please forgive the not-explicitly-about-S-Plus nature of this question,
but I figured that folks on this list would be able to help me on this
without much effort. Please feel free to just cite me to relevant writings
on this topic if there are any.

I'm in the midst of validating a set of regression models that we
developed on a randomly selected 2/3rds of a dataset. Our goal is to come
up with prediction equations that will best predict data from new samples.

In the course of working on this, I'm starting to wonder about how
advisable it is to rely on a *single* division of our total sample into
base and validation samples for this purpose. In these days of
'computer-intensive' analyses, is there a reasonable way of trying out lots
of different divisions? For instance, would it be reasonable to generate
bootstrapped sampling distributions of the various parameter estimates and
then use the medians of those distributions for predicting to new samples?
(If so, is that still reasonable if the predictors are not orthogonal to
one another?). Or is this sort of thing not worth the bother?

Many thanks!

-Roy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
| Roy Pardee, J.D., M.A. | |
| Psychology & Law Student | The difference between science and |
| University of Arizona | advocacy is that scientists expose their |
| | ideas to risk of falsification. |
| (Writing from Seattle!) | |
| roy@u.arizona.edu | |
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news