[S] reasonable p-values for Fisher exact's test - WAS strange ...

Charles C. Berry (cberry@tajo.ucsd.edu)
Tue, 24 Mar 1998 16:38:18 -0800


Before this thread enters an infinite loop, a few observations:

First, class(fisher.test(etc) ) == "htest"

So, print.htest() will format the results of fisher.test(). This is done
as follows

cat("p-value =", format(round(x$p.value, 4)), "\n")

(on Version 3.4 Release 1 for Sun SPARC, SunOS 4.1.3_U1 : 1996)

So the reports that fisher.test() *seemed* to work OK only imply that
the first 5 digits were OK.

Also, note that fisher.test() uses an algorithm which allows R x C
tables. This isn't required in simple 2 x 2 tables (and it wouldn't be
too hard to put in a switch for such tables), but this is what gets
used.

Getting to the point:

This algorithm usually yields answers that differ numerically from the
exact hypergeometric probability, viz the result of:

> fisher.test(matrix(c(0,2,2,2),nc=2))$p
[1] 0.4666666

differs from

> dhyper(0:2,2,4,2)
[1] 0.40000000 0.53333333 0.06666667

by an amount

> fisher.test(matrix(c(0,2,2,2),nc=2))$p-sum(dhyper(c(0,2),2,4,2))
[1] -2.78155e-08
>

And this isn't an isolated case. The following summaries are of numbers
that all equal zero under exact (and obvious) arithmetic:

> summary(sapply(1:20,function(x) fisher.test(matrix(c(1,1,x,x),nc=2))$p-1.0))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.407e-05 -1.997e-06 -8.941e-08 -3.189e-07 1.192e-06 1.562e-05
> summary(sapply(1:20,function(x) fisher.test(matrix(c(2,2,x,x),nc=2))$p-1.0))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.325e-05 -2.295e-06 2.384e-07 -7.927e-07 2.712e-06 7.868e-06

Only 1 of 40 , c(1,1,5,5) , gives exactly 0.0 as the result.

So, fisher.test() apparently uses an approximation which gives a correct
answer for the first 5 or 6 significant digits most of the time.

Even though the table

matrix(1,nr=2,nc=2)

would obviously lead to a p-value of exactly 1.0, it seems of little
practical import that fisher.test() reports it as

> print(fisher.test(matrix(1,nr=2,nc=2))$p,digits=10)
[1] 0.9999998808
>

If this is a problem, then dhyper() can be used in 2 x 2 tables. It
seems to generate results that are close to machine accuracy.

-- 

Charles C. Berry (619) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0622 ----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news