RE: interrater reliability

Carol Nickerson (caroln@gandalf.Berkeley.EDU)
Wed, 9 Nov 94 21:14:47 EST


Many thanks to the following persons who offered comments, suggestions,
computer programs, etc., to my recent posting re computing an
interrater reliability coefficient from a repeated measures analysis
of variance when there are missing data:

Jack Block <jblock@violet.berkeley.edu>
Joseph Carpenter <jcarpent@moose.uvm.edu>
John French <frenchj@iia.org>
Dave Krantz <dhk@paradox.psych.columbia.ddu
Paul A. Thompson <pat@po.cwru.edu>

* Hullo,
*
* A colleague has shown me some data for which her manuscript reviewer(s)/
* editor have requested an interrater reliability coefficient. I recommended
* one of the intraclass correlation coefficients discussed by Bartko (1966)
* and Shrout and Fleiss (1979) based on a repeated measures analysis of
* variance for which the data layout looks like this:
*
* Judge
*
* 1 2 3 4 ... K
*
* Target 1 X X X X ... X
* 2 X X X X ... X
* 3 X X X X ... X
* 4 X X X X ... X
* . . . . . ... X
* . . . . . ... X
* . . . . . ... X
* N X X X X ... X
*
* where X is a rating of a target by a judge.
*
* All well and fine, but now here's the quirk. The targets and the judges
* are the same persons. That is, each person in the group rated him/herself as
* well as all the other persons in the group (targets), so the rating on the
* diagonal is a self rating.
*
* The intent is to compute a mean rating across judges for each target.
* For theoretical reasons, it is desired to omit the target's self rating from
* this mean. Thus, the data layout is:
*
* Judge
*
* 1 2 3 4 5 ...
*
* Target 1 0 X X X X
* 2 X 0 X X X
* 3 X X 0 X X
* 4 X X X 0 X
* 5 X X X X 0
* .
* .
* .
*
* where 0 is a missing score, missing because it is a self rating
* that needs to be omitted. Is it possible to do the repeated measures
* analysis of variance on this data set? I know about the recommendations
* of some ANOVA experts regarding constructing substitute scores for
* missing data -- is doing that appropriate in this case? The missing
* scores aren't random, obviously. Moreover, it seems to me that any
* interrater reliability coefficient should be based on the same data
* that are going to be aggregated into the means, and that won't be
* the case if substitute scores are used.
*
* Any ideas?
*
* Many thanks,
*
* Carol Nickerson
* caroln@stat.berkeley.edu

Paul Thompson had the solution, announced with a cheery and amusing
"No problemo, dude" (subsequently amended to the more gender appropriate
"dudette").

I had been using this SAS format to get the needed mean squares for
computing the interrater reliability:

PROC GLM DATA = TEMP1;
MODEL T1-T7 = / NOUNI;
REPEATED BTMS 7;

PROC TRANSPOSE DATA = TEMP1 PREFIX = J OUT = TEMP2;

PROC GLM DATA = TEMP2;
MODEL J1-J7 = / NOUNI;
REPEATED BJMS 7;

This does not work for my colleague's problem because SAS discards
any record with one or missing data points; there is one missing
data point for each record, hence, all data are discarded.

Paul suggested reorganizing the data like so:

Judge Target Rating
1 1 0
1 2 X
1 3 X
1 4 X
1 5 X
2 1 X
2 2 0
2 3 X
2 4 X
2 5 X
3 1 X
3 2 X
3 3 0
3 4 X
3 5 X
etc.

and using a program of the form:

PROC GLM DATA = TEMP1;
CLASS JUDGE TARGET;
MODEL RATING = JUDGE TARGET;

which works just fine.

As a long time SAS user, I am embarrassed that I did not figure
this out myself, but we all have those days ..............

"Dudette"

aka Carol Nickerson
caroln@stat.berkeley.edu