[S] power problem in splus 4.5 - summary and response

Eran Bellin,M.D. (belliney@idt.net)
Wed, 22 Jul 1998 11:18:48 -0400

To the group:

I have included my initial question and two similar detailed answers
that I believe are similarly flawed. My thanks, of course, to the
authors for their effort.

When splus or any program calculates power it uses the normal
distribution not the t distribution. This is reasonable because for the
power calculation you are estimating the population means, the
population standard deviations.

After you have actually run the study, you are forced to use the
observed means and the standard deviation calculated from the sample.
Gosset created the t-test to address the issue that in small samples,
both statistical significance considerations and confidence intervals
would reflect the reality of a t-statistic not a normal distribution.

Therefore, from my perspective, it is perfectly legitimate to use a
normal distribution for your power calculation where you are estimating
the population standard deviation. But, after you have run the study,
or run the simulation, you must test for statistical significance using
the t-distribution.

I have repeated my simulation experiments with sample sizes of 3,4,5 and
you can clearly see that the power calculations are meaningful. It
seems that sample size 2 is uniquely unstable in the power program of
splus 4.5.

Eran Bellin, M.D.
Director Outcome Analysis and Decision Support
Montefiore Medical Center
Bronx, N.Y. 10467

Let us repeat the analysis, now requiring, three members for each

> for (i in

> length(z[z<.05])
[1] 881

If we require three members in each group, we achieve nominal
significance 881 of 1,000 times. We have a power of 88.1%

Similar analysis shows:

> for (i in

> length(z[z<.05])
[1] 985

For samples of size four we have a power of 98.5%.

> for (i in

> length(z[z<.05])
[1] 999

For samples of size five, we have a power of 99.9% .


Initial question:
I think that there is something wrong with the power program.
I asked for the normal mean power calculation of:

*** Power Table ***
mean1 sd1 mean2 sd2 delta alpha power n1 n2
1 66 4 80 4 14 0.05 0.8 2 2

This implies that if you have a group with mean of 80 and standard
deviation of 4
compared to a group with a mean of 66 and a standard deviation of 4 you
only need 2
members from each to find a statistically significant finding 80% of the

time at a .05 level.

Well, I then tested this by creating two vectors with these values and
then running t.tests on
samples of 2 members from each at a time and looked at the resultant p

> boys<-rnorm(1000,80,4)
> girls<-rnorm(1000,66,4)

> for (i in

I then asked, how many of these z observations have a p value of .05 or
less. The result

> length(z[z<.05])
[1] 484
> length(z)
[1] 1000

only 48.4% had a value of .05 or less. I should have expected 80%.

Why the difference?

Thank you in advance.

Eran Bellin, M.D.
Department Outcome Analysis and Decision Support
Montefiore Medical Center
Bronx, N.Y.


Re: [S] Flaw in power program - normal means
Wed, 22 Jul 1998 08:37:03 +0930
"Prof. Richard Jarrett" <rjarrett@stats.adelaide.edu.au>

I presume it is that the power program uses the approximation that
the sd is known.

If this is true, it bases its calcuations on using a
z rather than a t. In your case you have a t on 2df
which has VERY different properties to the Normal distn
assumed by the program.

For groups of size 10 or more, the difference between the t and the z
will make almost no difference to the results.

Richard Jarrett
Prof Richard Jarrett | Phone: +61 8 8303 3034
Dept of Statistics | Fax: +61 8 8303 3696
Univ of Adelaide | email: rjarrett@stats.adelaide.edu.au
Adelaide 5005 Australia| Web: http://www.maths.adelaide.edu.au/Stats

[S] Re: Flaw in power program
Tue, 21 Jul 1998 16:33:49 -0700
Steve Allan <sallan@statsci.com>

Dr. Bellin,

The formula used to compute sample sizes is based on the quantiles of
Normal distribution. So sample sizes less than about 10 should be
at askew when using the t test.

If you run the simulation using the Z statistic, you get about 95%
The exact sample size (check the 'Options' page in the dialog for exact
returns 1.28, so with n=2 the power is actually higher than 80% using a
Z statistic.

ztest <- function()
x <- rnorm(2, 66, 4)
y <- rnorm(2, 80, 4)
z <- (mean(x) - mean(y)) / 4
1 - pnorm(abs(z))

> z <- numeric(1000)
> for(i in 1:1000) z[i] <- ztest()
> sum(z < 0.05)
[1] 969

Conversly, if you select 'Min. Difference' in the dialog and enter
sample sizes of 10, you get an alternative of 85.012.

> boys <- rnorm(1000, 80, 4)
> girls <- rnorm(1000, 85.012, 4)
> for(i in 1:1000) z[i] <- t.test(sample(girls, 10, rep=T),
sample(boys, 10,
> sum(z < 0.05)
[1] 801

At a minimum, we should print a warning when the calculated sample
size is less than 10. I'll file a report on this.

Thank you for raising this point.


* Data Analysis Products Division
* MathSoft, Incorporated
* Email: sallan@statsci.com
* Phone: 206-283-8802
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe

send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news

This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news