[S] quantiles in presence of ties

Frank E Harrell Jr (fharrell@virginia.edu)
Thu, 9 Apr 1998 22:49:15 -0400


Thanks to Wei-Yin Loh, Arnold Dekkers, Matthew Zack, and especially
Charles Berry for setting me straight on how quantile() handles data
with many tied values. The choice lies between the "targeted order
statistic approach" and the inverse empirical distribution approach. I
have been able to adapt quantile() to handle sample weights and also to
implement an interpolated inverse CDF approach. In a few days I'll be
posting several functions that may be useful for handling weighted
data: wtd.mean, wtd.var, wtd.quantile, wtd.table, wtd.ecdf, wtd.rank,
wtd.loess.noiter. For example, wtd.rank is useful for computing the
Wilcoxon 2-sample statistic or the ROC area for very large datasets
that have been reduced somewhat using frequencies of unique
combinations of variables. It turns out that weighted ranks (or
unweighted ones for that matter) may be computed much more simply than
rank(), by recognizing that the rank of value x is the cumulative
frequency (cumulative weight) to the left of x plus 0.5 * (frequency at
x, minus one). This handles ties using midranks.

---------------------------------------------------------------------------
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Director, Division of Biostatistics and Epidemiology
Dept of Health Evaluation Sciences
University of Virginia School of Medicine
http://www.med.virginia.edu/medicine/clinical/hes/biostat.htm

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news