[S] Summary of Avoiding the Loop to Calculate Percentiles of All

Humbolt, Allen (HumboltA@kochind.com)
Thu, 22 Oct 1998 15:19:36 -0500


I thank Hong Ooi, Walt Dyar, Brian Ripley, Nick Ellis, and Remy vande Ven
for their responses.

To repeat my question:
Given data that begins and ends something like
> MyDataFrame[c(1:2, 2129:2130) , ]
Date M1 M2 M3 . . . M100
1 05/01/1990 1.565 1.627 1.651 . . .
2 05/02/1990 1.575 1.634 1.650 . . .
2129 10/19/1998 2.143 2.420 2.531 . . .
2130 10/20/1998 2.202 2.483 2.590 . . .
I wish to get all pairwise spreads' 95th %iles, like
ID P95
M1 - M2 0.4477100
M1 - M3 0.8436200
M2 - M3 0.4706499
I added that my real problem had as many as 100 variables, not just 3. Some
questioned why I wanted to do this. The ultimate goal is a simple screening
technique to look at today's data and see if any pairwise spreads are
unusual compared to historical spreads. So the final application will
select only those rows from the above result where current data are higher
than some defined percentile of historical data. [I'll probably pick
something like the 99th %ile; but it was convenient to use the 95th %ile on
some sample small data sets initially so I could see a few spreads being
flagged.]

I received multiple successful solutions for my problem given 3 variables.
Only Professor Ripley's solution actually worked for my real (and big)
problem. It avoided one of the two loops I had in my previously given
"simple but slow" function. Other suggestions that I received worked for a
small number of variables but resulted in memory problems as the number of
variables got big. The solution below took 19 seconds for 22 variables and
294 seconds for 88 variables and seems to accomplish my goal quite nicely.
Thanks, Professor Ripley.

Allen Humbolt
Quantitative Analyst
Koch Industries, Inc.
HumboltA@kochind.com

RipleyoneiFn <- function(i, datf)
{
A <- datf[, i+1] - datf[, (i+2):ncol(datf)] # a matrix in S, last i is a
vector
datfP95 <- apply(as.matrix(A), 2, function(z) as.vector(quantile(z, 0.95,
na.rm=T)))
datfID <- paste("V", i, " - ", "V", seq(i+1, ncol(datf)-1), sep="")
names(datfP95) <- datfID
datfP95
}

RipleyAllFn <- function(datf)
{
res <- RipleyoneiFn(1, datf)
for(i in 2:(dim(datf)[2]-2)) res <- c(res, RipleyoneiFn(i, datf))
res
}

All.sdat <- RipleyAllFn(MyDataFrame)
# 18.76 sec for 22 variables; 293.72 sec for 88 variables
# End RipleyAllFn
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news