Fw: [S] cross validating regression models

Frank E Harrell Jr (fharrell@virginia.edu)
Wed, 11 Mar 1998 14:02:29 -0500

-----Original Message-----
From: Frank E Harrell Jr <fharrell@virginia.edu>
To: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Wednesday, March 11, 1998 2:01 PM
Subject: Re: [S] cross validating regression models

>-----Original Message-----
>From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
>To: fharrell@virginia.edu <fharrell@virginia.edu>
>Date: Wednesday, March 11, 1998 10:08 AM
>Subject: Re: [S] cross validating regression models
>>Frank E Harrell Jr wrote:
>>> Yes Roy, Efron showed that the bootstrap has far less variance than
>>> cross-validation.
>>Leave-one-out cross-validation, I think. What paper were you referring to?
>If I'm not mistaken I think Efron's work also addresssed cross-validation (e.g., 10-fold).
>In one setting he showed that grouped cross-val is better than leave out one.
>The references are as follows (see mainly the first one).
> author = "Efron, B.",
> journal = JASA,
> pages = "316-331",
> title = "Estimating the error rate of a prediction rule: {I}mprovement on cross-validation",
> volume = "78",
> year = "1983"
> author = "Efron, B. and Gong, G.",
> journal = American Statistician,
> pages = "36-48",
> title = "A leisurely look at the bootstrap, the jackknife, and cross-validation",
> volume = "37",
> year = "1983"
> author = "Efron, B.",
> journal = JASA,
> pages = "461-470",
> title = "How biased is the apparent error rate of a prediction rule?",
> volume = "81",
> year = "1986"
>>Many other people have shown that in some problems (such as error-rate
>>estimation) the bootstrap method has a small variance, but achieves this
>>at the expense of bias. I think small variance is a red herring.
>I look at the mean squared error or at Prob(|estimate - truth| < epsilon). Many
>simulations I've done (and regretted not publishing so far) have shown that for most indexes of
>predictive accuracy, the bootstrap is better than 10- or 20-fold cross val. I have a technical
>report if anyone wants a copy regular-mailed to them (I've lost the electronic version unfortunately).
>>> All this is builtin to my Design library, which is only
>>> bombing for Marc Feldesman who is using a brand new version of Windows 95.
>>Called Windows 98? :)
>Don't think this is quite Win98. And Marc has solved the problem by re-installing the libraries.
>>> I think you would have to repeat 10-fold cross-validation 20 times to get the
>>> same accuracy as a simple bootstrap optimism correction method as is
>>> done with the following:
>>> library(Hmisc, T)
>>> library(Design, T)
>>> f <- ols(y ~ x1 + ...., x=T, y=T) # or lrm, psm, cph, bj, etc.
>>> validate(f, B=150)
>>> plot(calibrate(f,B=150))
>>Well, that is about the same amount of work! Do you have any theory for
>>this? My experience is that averaging across multiple 10-fold
>>cross-validations is more accurate than a comparable amount of work in
>>bootstrapping, which is what I would expect from the general ideas of
>>variance reduction.
>My work in averaging cross-vals is quite limited so I'll defer on this point.
>The bootstrap still has two advantages here: (1) It validates the final fit, i.e., the
>one developed on ALL of the data; and (2) if you've done variable selection (why?),
>the bootstrap incorporates the right amount of variation due to model uncertainty.
>Leave out one definitely doesn't work in that context, and 10-fold does't work very well.
>Great stuff to chat about -Frank
>>Brian D. Ripley, ripley@stats.ox.ac.uk
>>Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>>University of Oxford, Tel: +44 1865 272861 (self)
>>1 South Parks Road, +44 1865 272860 (secr)
>>Oxford OX1 3TG, UK Fax: +44 1865 272595

This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news