Re: [S] Comparing 2 vectors

Doug Luke (dluke@SLU.EDU)
Tue, 16 Jun 1998 11:23:01 -0500


Lance,

This is first post to S-News, so I hope it comes through OK!

What you are trying to do is the general problem of comparing partitions
between cluster solutions. Although people often use simple
missclassification rates or Cohen's kappa, the best approach is to use
the adjusted Rand statistic which has been developed to do exactly what
you want. For more information on the adjusted Rand, you should refer to
Hubert, L., & Arabie, P. (1985). Comparing partitions. <italic>Journal of
Classification</italic>, <italic>2</italic>, 193-218.

The Rand statistic is not available in S-Plus, or any other stats package
as far as I know. As luck would have it (!), I just developed an S-Plus
function to calculate it. This is my first attempt at programming with
S-Plus, so I'm sure it's not the most elegant approach. It does get you
the right answer, however!

Here's my function:

**** cut here ****

<excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt>"rand1"<<-

function(x, y)

{

# function to calculate the adjusted rand statistic

# developed by Douglas Luke, Saint Louis University, dluke@slu.edu

# based on Hubert & Arabie, 1985

# Eqn. 5 on p. 198


# x and y are vectors containing the two partitions to be compared

# first, get crosstabs


ctab <<- table(x,y)


# now calculate 4 intermediary sums


cellsum <<- sum(ctab*(ctab-1)/2)

totsum <<- sum(ctab)*(sum(ctab)-1)/2

# use matrix multiplication to get row and column marginal sums

rows <<- ctab %*% rep(1,ncol(ctab))

rowsum <<- sum(rows*(rows-1)/2)

cols <<- rep(1,nrow(ctab)) %*% ctab

colsum <<- sum(cols*(cols-1)/2)


# now put them together


adj.rand <<- (cellsum - (rowsum*colsum/totsum))/(.5*(rowsum +
colsum)-(rowsum*colsum/totsum))


adj.rand



}

</excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt>

**** cut here ****

At 05:41 PM 6/4/98 +0000, you wrote:

>I am trying to run some simulations with various clustering methods. I
set up three groups,

>run the cluster methods, and save the cluster solution. How can I
compare these 2 vectors of

>solutions?

>1 3

>1 1

>1 3

>2 1

>2 1

>2 1

>3 2

>3 2

>3 2

>

>

>I thought about running 'unique' on each vector which would give me

>

>X = 1 2 3

>Y = 3 1 2

>

>but I don't know where to go from there. I could compare X[1] with Y[1]
and if they are

>the same set a counter to counter + 1 and get the proportion of
correctly identified

>solutions but I don't know how to impliment this in SPlus 4. Maybe
someone knows of a better

>solution. I do see problems with this method.

>

>Lance

>

>

>-----------------------------------------------------------------------

>This message was distributed by s-news@wubios.wustl.edu. To
unsubscribe

>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the

>message: unsubscribe s-news

>

--------------------------------------------------------------------

Douglas Luke dluke@slu.edu

Saint Louis University School of Public Health 314-977-8108 office

3663 Lindell Blvd. 314-977-8150 fax

St. Louis, MO 63108

--------------------------------------------------------------------
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news