>
>-----Original Message-----
>From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
>To: fharrell@virginia.edu <fharrell@virginia.edu>
>Date: Wednesday, March 11, 1998 10:08 AM
>Subject: Re: [S] cross validating regression models
>
>
>>Frank E Harrell Jr wrote:
>>>
>>> Yes Roy, Efron showed that the bootstrap has far less variance than
>>> cross-validation.
>>
>>Leave-one-out cross-validation, I think. What paper were you referring to?
>
>If I'm not mistaken I think Efron's work also addresssed cross-validation (e.g., 10-fold).
>In one setting he showed that grouped cross-val is better than leave out one.
>The references are as follows (see mainly the first one).
>
>
>>
>>Many other people have shown that in some problems (such as error-rate
>>estimation) the bootstrap method has a small variance, but achieves this
>>at the expense of bias. I think small variance is a red herring.
>
>I look at the mean squared error or at Prob(|estimate - truth| < epsilon). Many
>simulations I've done (and regretted not publishing so far) have shown that for most indexes of
>predictive accuracy, the bootstrap is better than 10- or 20-fold cross val. I have a technical
>report if anyone wants a copy regular-mailed to them (I've lost the electronic version unfortunately).
>
>>
>>> All this is builtin to my Design library, which is only
>>> bombing for Marc Feldesman who is using a brand new version of Windows 95.
>>
>>Called Windows 98? :)
>Don't think this is quite Win98. And Marc has solved the problem by re-installing the libraries.
>
>>
>>> I think you would have to repeat 10-fold cross-validation 20 times to get the
>>> same accuracy as a simple bootstrap optimism correction method as is
>>> done with the following:
>>>
>>> library(Hmisc, T)
>>> library(Design, T)
>>> f <- ols(y ~ x1 + ...., x=T, y=T) # or lrm, psm, cph, bj, etc.
>>> validate(f, B=150)
>>> plot(calibrate(f,B=150))
>>>
>>Well, that is about the same amount of work! Do you have any theory for
>>this? My experience is that averaging across multiple 10-fold
>>cross-validations is more accurate than a comparable amount of work in
>>bootstrapping, which is what I would expect from the general ideas of
>>variance reduction.
>
>My work in averaging cross-vals is quite limited so I'll defer on this point.
>The bootstrap still has two advantages here: (1) It validates the final fit, i.e., the
>one developed on ALL of the data; and (2) if you've done variable selection (why?),
>the bootstrap incorporates the right amount of variation due to model uncertainty.
>Leave out one definitely doesn't work in that context, and 10-fold does't work very well.
>
>Great stuff to chat about -Frank
>
>>
