Re: Omnibus F test

Greg Hancock (ghancock@wam.umd.edu)
Wed, 12 Mar 1997 11:17:28 -0500


On Wed, 12 Mar 1997, William B. Ware wrote:

> My understanding is that it depends on which test you are talking
> about... Tukey's HSD and Scheffe's procedure are one-step procedures and
> can (should?) be done without the omnibus F having to be significant.
> They are "a posteriori" tests, but in this case, "a posteriori" means
> "without prior knowledge", as in "without specific hypotheses." On the
> other hand, Fisher's Least Significant Difference test is a two-step
> procedure. It should _not_ be done without the omnibus F-statistic
> being significant.

Dr. Ware is quite right. The omnibus test is an unnecessary, and in fact
potentially detrimental, hurdle in performing planned tests unless it is a
formal part of the decision structure (as in Fisher's LSD). [And, if
you're the sort to be concerned about familywise Type I error, Fisher's
LSD is only a viable option for k=3 groups.]

An excerpt from a recent paper in Review of Educational Research (66(3),
269-306) discusses the problems:

"There are a number of problems associated with the requirement of an
omnibus test rejection prior to conducting multiple comparisons; we will
present four. First, and most simply, few research questions are directly
addressed by an omnibus test. In a well planned study, the researcher's
questions involve specific contrasts of group means; the omnibus test,
addresses each question only tangentially. Some might argue that the
omnibus test is not present to answer questions; rather, it is there to
facilitate control over the rate of Type I error. This issue of control,
however, brings us to our second point -- the belief that an omnibus test
offers protection is not completely accurate. When the complete null
hypothesis is true, weak familywise Type I error control is facilitated by
the omnibus test; but, when the complete null is false and partial nulls
exist, the F-test does not maintain strong control over the familywise
error rate.
"A third point, which Games (1971) so elegantly demonstrated in his
figures, is that the F-test may not be completely consistent with the
results of a pairwise comparison approach. Consider, for example, a
researcher who is instructed to conduct Tukey's test only if an
alpha-level F-test rejects the complete null. It is possible for the
complete null to be rejected but for the widest ranging means not to
differ significantly. This is an example of what has been referred to as
incoherence (Gabriel, 1969) or incompatibility (Lehmann, 1957). On the
other hand, the complete null may be retained while the null associated
with the widest ranging means would have been rejected had the decision
structure allowed it to be tested. This has been referred to by Gabriel
(1969) as nonconsonance. One wonders if, in fact, a practitioner in this
situation would simply conduct the MCP contrary to the omnibus test's
recommendation. Strangely enough, such a seeming breach of multiple
comparison ethics would have largely positive statistical ramifications as
we discuss in our next and final point.
"The fourth argument against the traditional implementation of an
initial omnibus F-test stems from the fact that its well-intentioned but
unnecessary protection contributes to a decrease in power. The first test
in a pairwise MCP, such as that of the most disparate means in Tukey's
test, is a form of omnibus test all by itself, controlling the familywise
error rate at the a-level in the weak sense. Requiring a preliminary
omnibus F-test amounts to forcing a researcher to negotiate two hurdles to
proclaim the most disparate means significantly different, a task that the
range test accomplished at an acceptable alpha-level all by itself. If
these two tests were perfectly redundant, the results of both would be
identical and the omnibus test would represent neither friend nor foe;
probabilistically speaking, the joint probability of rejecting both would
be a when the complete null hypothesis was true. However, the two tests
are not completely redundant; as a result the joint probability of their
rejection is less than alpha. The F-protection therefore imposes
unnecessary conservatism (see Bernhardson, 1975, for a simulation of this
conservatism). For this reason, and those listed before, we agree with
Games' (1971) statement regarding the traditional implementation of a
preliminary omnibus F-test:

'There seems to be little point in applying the overall F test prior
to running c contrasts by procedures that set [the familywise
error rate] alpha.... If the c contrasts express the
experimental interest directly, they are justified whether the
overall F is significant or not and [familywise error rate] is
still controlled.' "

Best wishes,
Greg Hancock

mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Gregory R. Hancock
Department of Educational Measurement, ############################
Statistics, and Evaluation ############################
1230 Benjamin Building ############################
University of Maryland ############################
College Park, MD 20742-1115 scratch here to reveal prize

phone: (301) 405-3621 fax: (301) 314-9245 e-mail: ghancock@wam.umd.edu

Check out our graduate program in measurement, stats, and evaluation:
http://www.inform.umd.edu:8080/EdRes/Colleges/EDUC/.WWW/Depts/EDMS/
mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm