[S] Summary: log-linear models for matched-pairs data (long!) (fwd)

Robert Balshaw (rbalshaw@setosa.uwaterloo.ca)
Fri, 22 May 1998 10:13:38 -0400 (EDT)


<< My apologies if I've posted this before -- I certainly tried to, but
never saw it in s-news, so here it is again. >>

Last week, I asked for assistance with model fitting for matched
pairs data. I received several very helpful messages, which I summarize
below.

However, as is often the case, I have come to wonder if my question was
the right one. The models I asked for assistance with were models for the
association between the row ("C1") and column ("C2") variables in a K x K
table, with two "grouping" factors A and B.

I had hoped to fit models such as : "symmetry", "quasi-symmetry",
"conditional quasi-symmetry" (i.e., quasi-symmetry of C1 and C2 within
the A*B classifications), etc.

I now see that these models are analogous to various attempts to model the
within group covariance matrix in a multi-group MANOVA scenario.

And, as such, I now wonder about their utility.

The original research questions were of the form "Is there any evidence
of inflation or deflation in the response from C1 to C2? How is this
affected by the grouping factors? What is more important in determining
C2: C1 or the grouping factors?"

C1 and C2 were the responses to the same question, at two measurement
occasions. The response possibilities were the same at each occasion
and were definitely ordinal.

My current approach to the problem is more in line with ordinal logistic
regression (as I now think I should have been using all along, given the
nature of the third research question, above). I originally chose the
first approach because the researcher with whom I'm working started things
off that way. While I feel that some of the "finer" detail I may have
been able to extract using the conditional quasi-symmetry models, etc.,
may be lost, I'm certain that I understand better just what it is I am
modeling, and how.

As I have had little experience with categorical modeling prior to this,
it's been a worthwhile exercise for me. If anyone has any comments, I'd
be pleased to hear them. I'm sure I'm not the first to run through
this process, and I'm curious how others have handled things.

Thanks very much for any comments, and a special thanks to Charles Berry
<cberry@tajo.ucsd.edu> and Alan Zaslavsky <zaslavsk@hcp.med.harvard.edu>
for their helpful suggestions.

Rob

summary of main responses...

===== Charles' suggestions =====

<< After some back-and-forth, Charles provided me with a method to test
for "symmetry" and for "conditional symmetry". RB >>

> x.1 <- rep(1:4,4)
> x.2 <- rep(1:4,rep(4,4))
> dummies <- matrix(0,nr=16,nc=4)
> dummies[cbind(1:16,x.1)] <- 1
> dummies[cbind(1:16,x.2)] <- dummies[cbind(1:16,x.2)] + 1
> ij <- factor(ifelse(x.1==x.2,0,pmin(x.1,x.2)*max(x.1)+pmax(x.1,x.2)))
> a.frame <- data.frame(dummies,ij)
>
> model.matrix(~.-1,data=a.frame)
X1 X2 X3 X4 ij0 ij6 ij7 ij8 ij11 ij12 ij16
1 2 0 0 0 1 0 0 0 0 0 0
2 1 1 0 0 0 1 0 0 0 0 0
3 1 0 1 0 0 0 1 0 0 0 0
4 1 0 0 1 0 0 0 1 0 0 0
5 1 1 0 0 0 1 0 0 0 0 0
6 0 2 0 0 1 0 0 0 0 0 0
7 0 1 1 0 0 0 0 0 1 0 0
8 0 1 0 1 0 0 0 0 0 1 0
9 1 0 1 0 0 0 1 0 0 0 0
10 0 1 1 0 0 0 0 0 1 0 0
11 0 0 2 0 1 0 0 0 0 0 0
12 0 0 1 1 0 0 0 0 0 0 1
13 1 0 0 1 0 0 0 1 0 0 0
14 0 1 0 1 0 0 0 0 0 1 0
15 0 0 1 1 0 0 0 0 0 0 1
16 0 0 0 2 1 0 0 0 0 0 0
>

This (I hope) is right for eqn 10.7 for the 4 x 4 case. To extend this
to a third variable, you need to 'stack' 2 or more (via rbind, say) and
then cbind the third factor to the result to create the data.frame, and
you need to include the right interactions in the formula (between the
third factor and the terms if that is what you want)

===== Alan's suggestions =====

You can create factors for the effects corresponding to the model of
symmetry or quasi-symmetry and then fit poisson loglinear models in the
usual way using glm().

For example, first create a fake dataset.

my.frame_expand.grid(A=c("a","b"),B=c("c","d"),
C1=c("x","y","z"),C2=c("x","y","z"))
my.frame$counts_rpois(36,10)

Now create a factor representing the symmetric coding of C1 and C2.

my.frame$symm_paste(
pmin(as.numeric(my.frame$C1),as.numeric(my.frame$C2)),
pmax(as.numeric(my.frame$C1),as.numeric(my.frame$C2)),sep=",")

Now we fit a few models. All of them have C1,C2 independent of A,B, just
for simplicity:

First, symmetry:
model1_glm(counts~A+B+symm,family=poisson(),data=my.frame)

<< I'm not convinced that this is the symmetry model I was intending,
but I was grateful for the ideas! RB >>

Second, quasi-symmetry:
model2_glm(counts~A+B+symm+C1+C2,family=poisson(),data=my.frame)

Finally, the saturated model in C1, C2:
model3_glm(counts~A+B+C1*C2,family=poisson(),data=my.frame)

Take a look at model1 (others left to the reader!):

> model1
Call:
glm(formula = counts ~ A + B + symm, family = poisson(), data =
my.frame)

Coefficients:
(Intercept) A B symm1,2 symm1,3 symm2,2
symm2,3
1.695568 0.1191867 -0.1072429 0.4784888 0.5596153 0.7156185
0.6584589
symm3,3
0.4924738

Degrees of Freedom: 36 Total; 28 Residual
Residual Deviance: 30.48478

Then compare them with anova():

anova(model1,model2,model3)

giving the following results:
Analysis of Deviance Table

Response: counts

Terms Resid. Df Resid. Dev Test Df Deviance
1 A + B + symm 28 30.48478
2 A + B + symm + C1 + C2 26 30.06036 +C1 2 0.4244128
3 A + B + C1 * C2 25 30.05949 2 vs. 3 1 0.0008723

(of course with these random data we don't expect anything to be
significant). You have full flexibility about specifying which of
the effects in C1 and C2 (marginal, symmetric interaction, asymmetric
interaction) are interacted with the other factors (A and B).

For your trend over time (let's say toward higher categories in C), you
might do something like this, which is a strange sort of quasisymmetry
model
with a trend built in:

my.frame$Ctrend_as.numeric(my.frame$C2)-as.numeric(my.frame$C1)
model4_glm(counts~A+B+symm+Ctrend,family=poisson(),data=my.frame)

=========

My original post: On Sat, 9 May 1998, Robert Balshaw wrote:

> I am looking at some particular log-linear models based on matched pairs,
> such as are discussed in Chapter 10 of Agresti's "Categorical Data
> Analysis".
>
> I would like to be able to construct tests of hypotheses such as
> conditional symmetry and quasi-symmetry.
>
> These models can be expressed as restrictions on the parameters of the
> completely specified model for the frequencies in a contingency table, but
> I don't know how to fit these models in S-Plus. I'm hoping that someone
> has already tackled this problem.
>
> As an example, consider the I x J x K x K table on factors A, B, C1 and
> C2. Factors C1 and C2 represent a K x K cross-tabulation of
> subjects respondents to a K-category question at two times.
> How would one test (in S-Plus) whether the C1 by C2 table is symmetric,
> after conditioning on A and B, for example? Or, is there apparent
> "trend" in the response from time 1 to time 2.
>
> Any suggestions would be most welcome. I will summarize responses to the
> list.
>
> Thanks,
>
> Rob

-- Robert Balshaw Statistics & Actuarial Science --
-- rbalshaw@setosa.uwaterloo.ca University of Waterloo --
~~~

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news