Re: [S] How to do a "one-to-many merge" in SPlus?

Frank E Harrell Jr (fharrell@virginia.edu)
Wed, 16 Sep 1998 14:29:25 -0400


John,

The bootcov, validate, and calibrate functions in my Design
library allow for cluster bootstrapping. It's quite easy using
split() and unlist(). Here's a piece of the code to give you an idea.
This code creates a list of record numbers, split by subject ID for example. The record (observation) numbers can be sampled and
then the entire vector of original observation numbers to sample can be gotten with unlist. You can then use this "unsplit" vector
to carry along the desired record numbers for all variables analyzed.

cluster <- as.character(cluster) #e.g., cluster = patient IDs with repeats

clusters <- unique(cluster)
nc <- length(clusters)
Obsno <- split(1:n, cluster) # n = # total records, not subjects

for(i in 1:B) {
j <- sample(clusters, nc, replace=T)
obs <- unlist(Obsno[j])
f <- fitter(X[obs,], Y[obs,], maxit=maxit, penalty.matrix=penalty.matrix)
cof <- as.vector(f$coef)
..... }

See

@Article{fen96com,
author = {Feng, Ziding and {McLerran}, Dale and Grizzle, James},
title = {A comparison of statistical methods for clustered
data analysis with {Gaussian} error},
journal = Statistics in Medicine,
year = 1996,
volume = 15,
pages = {1793-1806},
annote = {clustered data; cluster bootstrap; simulation setup;
moving blocks bootstrap; repeated measurement data;
longitudinal data; GEE; bootstrap performed
extremely well with small numbers of large clusters}
}

for an article showing the merits of cluster bootstrapping.
-------------------------------------------------------------------------------------------
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
hesweb1.med.virginia.edu/biostatistics.html
-----Original Message-----
From: John Oehlert <joehlert@leland.stanford.edu>
To: s-news@wubios.wustl.edu <s-news@wubios.wustl.edu>
Date: Wednesday, September 16, 1998 11:44 AM
Subject: [S] How to do a "one-to-many merge" in SPlus?

>Hi All,
>
>I am bootstrapping a patient dataset containing approx. 400 patients each
>having anywhere from 2 to 9 "visits". Because the number of visits is not
>the same for all patients the resulting bootstrap dataset have a varying #
>of visits from 700 to just over a 1000. I could not see how to use any of
>the canned bootstrap routines to sample patients with replacement and then
>reconstruct a visit dataset so I did it myself using a list of unique
>patient identifiers and the sample() funtion. This was a piece of cake.
>(However, if anyone can show me how to do this with one of the standard
>bootstrap routines I w/not complain!)
>
>My difficulty came when I tried to take this list of patients and
>reconstruct the bootstrap "visit" dataset. I could not get SPlus to take
>each patient in the new patient list and pull all visits for that patient
>from the visit file....and do this for as many times as the patient appears
>in the patient list. I finally gave up and crunched the problem using a for
>loop and rbind(). I know this sort of loop is very costly re: time and
>w/like to learn a more efficient way to do this task. I have included the
>loop structure below. Perhaps someone can tell me how a real SPlus wiz w/go
>about this.
>
>Thanks in advance.
>John
>
># B = the # of bootstrap replicates to create.;
>
> id.list_sort(unique(in.dsn[,]$patkey));
> size_length(id.list);
>
> replicates_matrix(nrow=size, ncol=B);
>
># 6 is the length of the output vector from max.knots;
> knots_matrix(nrow=B,ncol=6);
>
> for(r in 1:B)

> print(paste("Starting B=",r));
> replicate_sort(sample(id.list, size=size, replace=T));
> new.dsn_matrix(nrow=1, ncol=dim(in.dsn)[2], NA);
> for(rep.number in 1:size) {
> new.pat_in.dsn[in.dsn[,"patkey"]==replicate[rep.number],]
> if(rep.number==1) new.dsn_new.pat else new.dsn_rbind(new.dsn,new.pat);
> }
> knots[r,]_max.knot(run.knots(new.dsn, MinKnot=Min.Knot,
>MaxKnot=Max.Knot));
> }
>
>Thanks again to all!
>j.
>
>
><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
>
>John Oehlert
>Statistical Programmer/Systems Analyst
>Division of Biostatistics
>HRP Redwood Building, Rm T100
>Stanford, California 94305-5405
>
>Voice: (650) 725-2925
>Fax: (650) 725-6951
>Email: joehlert@leland.stanford.edu
>
><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message: unsubscribe s-news
>

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news