[S] SUMMARY:: Summarizing replicate data

Jonathan Monteleone (monte@ihug.co.nz)
Thu, 14 May 1998 10:15:41 +1200


Thanks, as usual, for the replies to my queries 8-) I received email from
several people asking me to forward replies to them. I figured I would
post a summary.

ORIGINAL QUESTION
> I have data where I replicated each point 100 times. I would like to
> create a summary of simple statistics on each point, e.g. mean, sd, var.
> Any easy ways to do this?
>
> RAW DATA FILE EXAMPLE
> 1000 10 12
> 1000 10 8
> 1000 10 10
> etc...
> 5000 10 25
> 5000 10 28
> etc...
>
> I want an output file to look like the following
> 1000 10 mean sd var
> 5000 10 mean sd var
====================================================
use of the function tapply will give you the means sd and variances.
use of data.frame will format the data

eg:
out_data.frame(tapply(data$EXAMPLE,data$RAW,mean))
out$variance__tapply(data$EXAMPLE,data$RAW,var)
out$sd_sqrt(out$variance)

You may still need to rename things in the data frame to get exactly what
you want.

Anne
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Anne E. York
National Marine Mammal Laboratory
Seattle WA 98115-0070 USA
e-mail: york@orca.akctr.noaa.gov
Voice: +1 206-526-4039
Fax: +1 206-526-6615
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
====================================================
Creating a table like the one you want is easy using aggregate(), but only
if the statistic you are trying to tabulate is a scalar. For example,
to create the file with just the mean, assuming your classification variables
are a and b, and your analysis variable is y, use

aggregate(x,list(a,b),mean)

For a single analysis variable and multiple classification variables, tapply
returns a matrix, but when the function returns multiple values, the matrix
becomes awkward to use.

If there is at least one observation for every combination of classification
variables, the following code will produce roughly what you want:

z <- tapply(x,list(a,b),function(x)c(mean(x),sqrt(var(x)),var(x)))
vals <- unlist(z)
vals <- matrix(vals,by=T,ncol=length(vals) / prod(dim(z)))
answer <- data.frame(do.call("expand.grid",dimnames(z)),vals)

It's certainly not bulletproof, but it might be easier than repeated use
of aggregate or tapply.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector@stat.berkeley.edu
====================================================
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news