Enrica Bellone found this rather intriguing effect, is it a bug?
SHORT SUMMARY: ks.gof() is finding the Kolmogorov-Smirnov statistic
incorrectly, in a call to function ks2(). A possible fix is at the bottom
of this message.
LONG BLURB:
> version
Version 3.4 Release 1 for Silicon Graphics Iris, IRIX 5.3 : 1996
> x <- c(1.1, 2.1, 3.1)
> y <- c(1.1, 2.1, 3.1, 4.1)
>
> ks.gof(x,y)
Two-Sample Kolmogorov-Smirnov Test
data: x and y
ks = 0.5, p-value = 0.6571
alternative hypothesis: cdf of x does not equal the
cdf of y for at least one sample point.
> ks.gof(y,x)
Two-Sample Kolmogorov-Smirnov Test
data: y and x
ks = 0.25, p-value = 1
alternative hypothesis: cdf of y does not equal the
cdf of x for at least one sample point.
Is it easy to see that ks should be the same for ks.gof(x,y) and
ks.gof(y,x) by virtue of the test statistic in question?
D_{m,n} = sup | F_m(x) - G_n(x) |
x
where F_m and G_n are ecdfs of x and y resp.
This happens inside the function ks2();
> ks2(x,y,"two-sided")
[1] 0.5
> ks2(y,x,"two-sided")
[1] 0.25
The actual K-S distance (D_{m,n} as a test stat) in this example is 0.25.
Do MathSoft know about this, do I have a strange version of ks.gof(), do
other people see this, will it be fixed etc etc?
S D Byers
PS: Here is a fix that we are using for now, it seems to work but we do
not claim infallibility.
function(x, y, alt.expanded) {
###
#Calculates value of the KS statistic for two samples
#Implements procedure of Hollander and Wolfe (1973), Nonparameteric
#Statistical Methods, pg. 224-226, using empirical distribution fncs.
#Handles tied observations
#------------------------------------
#Input
#x one sample
#y other sample
#alt.expanded one of "two.sided", "greater", or "less"
######################################################################
nx <- length(x)
ny <- length(y)
z <- sort(unique(c(x, y)))
x.indicator <- rep(0, length(z))
y.indicator <- rep(0, length(z))
x.indicator[match(x, z)] <- 1
y.indicator[match(y, z)] <- 1
F.x <- cumsum(x.indicator)/nx
F.y <- cumsum(y.indicator)/ny
switch(alt.expanded,
less = max(F.y - F.x),
#T-
greater = max(F.x - F.y),
#T+
max(abs(F.x - F.y)) #T
)
}
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news