[S] Fast approximate Bayesian bootstrap: summary

Frank E Harrell Jr (fharrell@virginia.edu)
Sat, 8 Aug 1998 11:14:28 -0400


Thanks to Charles Berry and Jeff Longmate for two thoughtful solutions.

To reiterate the goal:

Sample with replacement n of n values
Sample with replacement m of these n sampled with replacement

Charles Berry's solution: (cberry@tajo.ucsd.edu)

"samp.samp" <- function(pop,n,m,n.reps )
{
x <- matrix(
sample(pop,n*n.reps,repl=T),
nc=n.reps)
y <- runif(m*n.reps,0,n)
z <- x[ceiling(y) +
rep( seq(from=0,by=n,length=n.reps), rep(m, n.reps))]
dim(z) <- c(m,n.reps)
list(z=z,x=x)
}

I checked a couple hundred replicates to see that the second samples
agreed with the first stage samples---the p-values from a chisq
goodness of fit are a pretty good fit to a uniform

Some of the timing (old SPARC 5 - about to be retired):

> unix.time(junk <- samp.samp(1:100,200,300,100))
[1] 0.5333328 0.1333332 1.0000000 0.0000000 0.0000000
unix.time(junk <- samp.samp(1:100,200,300,1000))
[1] 4.9333344 0.7166667 6.0000000 0.0000000 0.0000000
unix.time(junk <- samp.samp(1:100,200,3000,100))
[1] 3.8500023 0.5500002 5.0000000 0.0000000 0.0000000
unix.time(junk <- samp.samp(1:100,2000,300,100))
[1] 1.4000015 0.2666669 1.0000000 0.0000000 0.0000000
unix.time(junk <- samp.samp(1:1000,200,300,100))
[1] 0.5 0.0 1.0 0.0 0.0

Chuck later sent an improved version:

"samp.samp" <- function( pop,n,m,n.reps )
{
x <- matrix(sample(pop, n * n.reps, repl = T), nc = n.reps)
z <- x[sample(n, m * n.reps, rep = T) +
rep(seq(from = 0, by = n, length = n.reps), rep(m, n.reps))]
dim(z) <- c(m, n.reps)
list(stage.2 = z, stage.1 = x)
}

Principally to get rid of the ceiling() stuff which was only in there
because it was late and for some reason I didn't see that sample() would
do this just as well.

How it works? It generalizes sample(pop,n,repl=T)[sample(1:n,m,repl=T]
which is equivalent to sample(sample(pop,n,repl=T),m,repl=T)

Jeff Longmate's solution (jlongmat@coh.org)

This uses rep to match the samples of m to samples of n. I haven't timed it.

# r = residual vector to resample
# m = number sampled in each imputation
# B = number of imputations
# Result = m by B matrix, sampling n then m w/repl. for each row.

foo <- function(r,m,B){
n <- length(r)
i <- ceiling(runif(n*B, 0, n))
j <- ceiling(runif(m*B, 0, n)) + rep((0:(B-1))*n, rep(m,B))
matrix(r[i[j]], m, B)
}

Jeff then compared the two solutions:

They seem to use the same basic idea, i.e. generate all the reps together,
use indexing to get the second stage, matched to the first
by adding a stride vector, calculted with rep.

The two ways of sampling m * n.reps integers w/ replacement from 1:n
are:
sample(n, m * n.reps, rep = T)
ceiling(runif(m*B, 0, n))

Berry's use of sample seems to avoid some nesting of functions, as
well as one use of subsetting. Whether it's faster probably depends
on how efficiently sample is implented.
His initial call to matrix is unnecesary if you don't display
the first stage -- in case your shaving time.

Despite the apparently different ways of sampling:
> set.seed(321)
> a <- samp.samp(letters,26,10,4)[[1]]
> set.seed(321)
> b <- foo(letters,10,4)
> all(a==b)
[1] T

Thanks to both! -Frank
---------------------------------------------------------------------------
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Director, Division of Biostatistics and Epidemiology
Dept of Health Evaluation Sciences
University of Virginia School of Medicine
http://www.med.virginia.edu/medicine/clinical/hes/biostat.htm

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news