Re: matched pairs vs. independent samples
RDAWSON@HUSKY1.STMARYS.CA
Wed, 8 May 1996 09:42:47 -0400
I deliberately broke this question down in my intro stats
course this year into two parts: experimental design (first, before
the idea of a hypothesis test was mentioned) and testing. I introduced
it via a lab in which the class had to find out informally (method
unspecified but feedback available from the instructor) whether people
threw a beanbag more accurately with their dominant hand or the other one.
The basic idea of pairing seemed almost instinctive to them, at
least as a way of preventing a chance concentration of horseshoe-pitchers
among the other-hand throwers bias the result. Some didn't catch on
immediately to the advantage of looking at the differences, but after
some discussion they seemed to appreciate the utility of that too. There
seemed to be a pretty good consensus that it was good to use pairing in
an experimental design.
(Another serendipitous lesson: Some groups ended up surprised to
find that their "other" throw was more accurate. After discussion, we
found that they had mostly thrown with their dominant hand first... so
the advantage of randomizing nuisance parameters [here, a hypothesized
learning effect] was also demonstrated.)
Now, as far as data analysis is concerned, it is trickier. What is
the cost if we make the wrong decision? And what is the appropriate
response in the following cases?
(a) Observations were blocked in the experimental design,
in good faith, by a variable that turned out to be irrelevant
(b) Observations were blocked, carelessly, after the fact,
using an irrelevant variable
(c) Observations were taken by somebody else; a glance suggests
that blocking by a certain variable will eliminate much variance.
(d) Observations were taken by somebody else; preliminary data
analysis shows that blocking will help to some extent
(e) Observations were taken by somebody else: a negative
correlation is observed, implying that blocking will increase variance
and possibly hide a geniuine effect
(f) Observations were paired in the experimental design, but a
negative correlation is observed,implying that blocking will increase
variance and possibly hide a genuine effect.
In case (b), we would probably say "do it again properly" - in
fact, several posters earlier in this thread cited examples of careless
work with just this implication. But what about case (a) which differs
only in the experimenter's original intention? Should the data be unblocked,
possibly gaining crucial deegrees of freedom and a lighter-tailed t? Or is
this data snooping?
Again,in cases (c) and (d) we have a situation in which pairing or
blocking *should* have been planned but wasn't. The person analysing the data
can reduce the variance by a data-driven change in plans: is this kosher? If
we phone the experimenter and say "if you had thought of pairing would you
have done it?" and they say "yes", is it kosher then?
Finally, in case (e) blocking (which was not explicitly planned)
appears as if it might be counterproductiuve. Can we use this knowledge to
*not* block? And what if we *did* specifically pair subjects in our original
design - are we allowed to ignore this if it seems that the pairing will
inflate our variance?
By the way -examples that I have given my classes of situations in
which pairing is possible but possibly counterproductive are:
-determining whether more tofu or steak is consumed by a population
-determining whether more cans of Coke
-popularity of two opposing politicians
Perhaps what is needed here is a simulation (or calculation)
to determine the overall p-value of the process
if correlation>0 do paired test at level p
else do unpaired test at level p
My suspicion is that the p-values would only be slightly inflated, and that
power would be much improved in correlated cases. Anybody know about this?
-Robert Dawson