Re: [S] cross validating regression models

Frank E Harrell Jr (fharrell@virginia.edu)
Tue, 10 Mar 1998 21:10:24 -0500


Yes Roy, Efron showed that the bootstrap has far less variance than
cross-validation. All this is builtin to my Design library, which is only
bombing for Marc Feldesman who is using a brand new version of Windows 95.
I think you would have to repeat 10-fold cross-validation 20 times to get the
same accuracy as a simple bootstrap optimism correction method as is
done with the following:

library(Hmisc, T)
library(Design, T)
f <- ols(y ~ x1 + ...., x=T, y=T) # or lrm, psm, cph, bj, etc.
validate(f, B=150)
plot(calibrate(f,B=150))

It's not very fruitful to summarize the bootstrapped regression coefficients to develop
newer models; one bootstraps the original model fit to obtain estimates of its performance.
See e.g. Harrell, Lee, Mark in Stat in Med Feb 28, 1996.

-Frank Harrell

--Original Message-----
From: Roy Pardee <roy@u.arizona.edu>
To: s-news@wubios.wustl.edu <s-news@wubios.wustl.edu>
Date: Tuesday, March 10, 1998 5:29 PM
Subject: [S] cross validating regression models

>Greetings All,
>
> Please forgive the not-explicitly-about-S-Plus nature of this question,
>but I figured that folks on this list would be able to help me on this
>without much effort. Please feel free to just cite me to relevant writings
>on this topic if there are any.
>
> I'm in the midst of validating a set of regression models that we
>developed on a randomly selected 2/3rds of a dataset. Our goal is to come
>up with prediction equations that will best predict data from new samples.
>
> In the course of working on this, I'm starting to wonder about how
>advisable it is to rely on a *single* division of our total sample into
>base and validation samples for this purpose. In these days of
>'computer-intensive' analyses, is there a reasonable way of trying out lots
>of different divisions? For instance, would it be reasonable to generate
>bootstrapped sampling distributions of the various parameter estimates and
>then use the medians of those distributions for predicting to new samples?
>(If so, is that still reasonable if the predictors are not orthogonal to
>one another?). Or is this sort of thing not worth the bother?
>
>Many thanks!
>
>-Roy
>+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
>| Roy Pardee, J.D., M.A. | |
>| Psychology & Law Student | The difference between science and |
>| University of Arizona | advocacy is that scientists expose their |
>| | ideas to risk of falsification. |
>| (Writing from Seattle!) | |
>| roy@u.arizona.edu | |
>+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message: unsubscribe s-news
>

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news