[S] Bootstrap Default Sampling Scheme

Marc Feldesman (feldesmanm@pdx.edu)
Thu, 26 Feb 1998 10:48:43 -0800


Thank you to all who replied to my question about the default sampling
scheme used in the function bootstrap(). I received more than 25 private
replies to my emails ranging from "RTM" (not helpful), to critiques of my
application, to, finally, a very informative and detailed answer to the
specific question I asked from Charlie Roosen at Mathsoft. From Charlie's
reply, we then exchanged a series of additional emails to make certain that
Charlie and I were on the same wavelength and then Charlie provided me with
the answer I needed.

I deliberately left out some details of the analysis when I asked the
original question because they were not, to me, germane to the question I
asked. I merely wanted to know how the bootstrap routine constructed
default samples when the statistic was something like the F-statistic from
an ANOVA, which was, in turn based on unequal sample sizes (in this
particular case, 13, 15, and 27). My question was: does bootstrap()
construct the samples by taking a random sample of 55 cases with
replacement and then assign the first 13 cases to category 1, the next 15
cases to category 2, and the last 27 cases to category 3? Or does it
sample within each of the 3 groups randomly, confining the sampling frame
to 13, 15, and 27. It appeared to me that the group parameter was designed
for the second circumstance, which would lead me to believe that the
default was actually the first. Since I specifically did NOT want the
second condition, I felt I needed to know for sure how the samples were
being formed (in particular because the results I was obtaining deviated
significantly from the results I got using a different program where I knew
exactly what the program was doing).

As it turns out, that is both true and false at the same time. While
bootstrap() samples from the entire set of 55 cases, what I didn't realize
until Charlie pointed this out to me, was that by keeping the GROUP
variable (as in aov(VARIABLE~GROUP, data=dataset)) as part of the sampling
scheme, it too was being permuted along with the VARIABLE variable, thereby
changing the number of individuals per group with each replicate -
something I clearly did not want to happen (and which explained the odd
results).

The solution was to detach the GROUP variable from the data frame by
preceding the bootstrap() function with an assignment like:

orig.group<-GROUP

and then forming the statistic in bootstrap like:

...summary(aov(VARIABLE~orig.group))[1,4]...

This fixes the group but varies VARIABLE over all 55 cases and insures that
the group sizes remain constant throughout the analysis. Once I made this
change, the results are as I would have expected.

For those who wondered why we didn't use a permutation test, we did. In
fact, the sampler in the above statement was sampler=samp.permute.
However, this fact didn't seem (to me) relevant to the particular question
I had asked. In the interest of parsimony, I left this detail (and many
others about the analysis) out in the interests of parsimony.

Thanks again for all your help and suggestions.

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news