>5.1. Here is another simple example; try analyzing
>these data:
> A B Y
> 1 1 1
> 1 1 2
> 1 2 3
> 1 2 4
> 2 1 5
> 2 1 6
>The default results computed by SPSS are:
> Sum of Mean Sig
>Source Squares DF Square F of F
>Main Effects 16.000 2 8.000 16.000 .025
> A 16.000 1 16.000 32.000 .011
> B 4.000 1 4.000 8.000 .066
>
>Is there really a marginally significant B main
>effect? Here are the cell means and marginal means:
>
>Factor A Factor B Marg.Means
> 1 2
> ----------------
>1 | 1.5 3.5 | 2.5
>2 | 5.5 missing | 5.5
> ----------------
> 3.5 3.5
>
>Why does SPSS report an almost significant
>effect for the set of identical marginal means for
>factor B?
>From a modeling standpoint, there really is very little mystery
here, if any at all. Indeed, when B's main effect ALONE is included in
the model, then the OLS solution shows that B has zero effect on Y.
Simply looking at the covariance between B and Y in this small of
a data set easily shows why this is the case. If we only had factor
B in the design, then we would conclude that B has no influence on Y.
However, factor B is not the only factor under consideration here.
After A's main effect is included in the model ALONG WITH B's main
effect, then the OLS solution shows that both A and B have significant
influences on Y. Again, there is no mystery as to why this would
be the case. Given the data patterns of A, B, and Y, once A is included
in the model it necessarily soaks up all of the variance contrasting
y={1,2,3,4} vs. y={5,6}. With this specific variance in Y accounted
for by A, factor B captures additional variance in Y contrasting
y={1,2} vs y={3,4}. From an OLS estimation and main-effects modeling
standpoint (i.e., no interactions included in the model), there is
indeed a significant B effect when A is included in the model.
In this context, examining factor B's marginal distribution is
misleading in that (a) such an examination relates to the model
where B's main effect alone is included and (b) it masks the effect
of B on Y when A is included in the model.
Though I hesitate to do so because of the tone this debate has taken
over the past weeks, I feel compelled to interject a more subjective
comment at this point. First, it is certainly true that various
stat packages will have different default ways of handling problems.
There is nothing wrong with that. Any user who doesn't either read
the manual or examine the help screens when using a specific package
is, in my opinion, very foolish. On the other hand, students and
researchers alike, for good or for ill, expect that these defaults lie
somewhere close to what may be considered "the standard practice."
For unbalanced designs such as those being discussed above and in the
original note from Ms. Miller, Statistica does seem to be
in the minority in how it handles such designs. Further, it is not
clear to me that Statistica's default approach on these designs can
be characterized as simply "different." If the approach tends to
result in masking significant effects of factors on the response,
then it appears that that approach should be characterized as inferior
to other well-known approaches which do not mask those effects.
Scott R. Eliason
Department of Sociology
University of Iowa