As I intended it the research hypothesis is that temperature is
correlated with how often certain kinds of items are bought together in
pairs. It says that temperature is positively related to the frequency
with which items are bought together. Whether this is ALWAYS the case,
seems to me to be another question (Dave Krantz interpreted me to mean
all item-pairs were related to temperature), although all pairs of
items are of the same "kinds" and therefore would be expected to have
the same relationship with temperature.
I originally viewed the statistical testing as ending with each test of
correlation between each Y0 and Y(temperature). The research
hypothesis is only assessed by looking at the number of pairs that are
in fact correlated (positively or negatively) with temperature and
discussing those that are. Then I could say for example that for a
certain pair of items, there is correlation. Of course I would like to
somehow evaluate the number of significant correlations to test the
research hypothesis itself, and this is where the multiple comparisons
problem came from.
The data were obtained for all supermarkets (40 of them) in a given area over a given period of time. Therefore the results would apply only to supermarkets in this area. I also view the association values for the pairs of items as not being a sample – the set of supermarkets is the population in question. So I treat the association values as measured parameters of that population.
I have an association value for each pair of items for each supermarket
where they occurred as a pair, but some items were not purchased
together at all supermarkets, so the sample size for the association
values is small for some pairs. As I see it, the zeros do not figure
into the analysis because the items simply were not purchased at some
of the supermarkets. It seems to me that this confounds multivariate
analysis as Jim Handsfield suggested because of the existence of these
zeros (missing data).
To use Dave Krantz's terms, the association value for each pair of
items is Y. Y0 is temperature. The question is whether any of these
combinations of association between one pair of items (the Ys) and
temperature (Y0) are correlated and whether the correlation is
meaningful. The null is thus tested for each available pair of items
against temperature, which is where the question of multiple comparison
came in. As it happens, of all pairs, exactly 4.5% were significant
(were correlated with temperature) at alpha=0.05. The question is how
to determine whether these 5 are spurious or how I can control the
overall thing to avoid this issue, or whether this is even an issue.
Now it was suggested by Dr. Parkhurst that I simply eliminate
significance tests. I take this to mean that I should just get
confidence intervals for each Y and Y0 correlation coefficient and look
at them closely. Then I could say, for example, that only 4.5% of
expected correlations were within a certain interval, and so it really
looks like the research hypothesis is not supported by the data. This
is not a definitive test, but it seems to be in the spirit of the
non-hypothesis test approach (am I correct here?)) But it seems to me
that if multiple comparison is in fact a problem for significance
tests, then calculation of the interval is also affected by this
problem.
SO the question is still whether multiple "comparison" is a problem.
Maybe the original question should have more generally asked when this
is a problem. Any responses to this question, as well as comments on
this weird analysis and my interpretation of the non-hypothesis testing
approach, would be appreciated.
Frank O'Hare
-----------------------------------------------------
Get free personalized email at http://email.lycos.com
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news