# Re: [S] Comparing 2 vectors

Doug Luke (dluke@SLU.EDU)
Tue, 16 Jun 1998 11:23:01 -0500

Lance,

This is first post to S-News, so I hope it comes through OK!

What you are trying to do is the general problem of comparing partitions
between cluster solutions. Although people often use simple
missclassification rates or Cohen's kappa, the best approach is to use
the adjusted Rand statistic which has been developed to do exactly what
Hubert, L., & Arabie, P. (1985). Comparing partitions. <italic>Journal of
Classification</italic>, <italic>2</italic>, 193-218.

The Rand statistic is not available in S-Plus, or any other stats package
as far as I know. As luck would have it (!), I just developed an S-Plus
function to calculate it. This is my first attempt at programming with
S-Plus, so I'm sure it's not the most elegant approach. It does get you

Here's my function:

**** cut here ****

<excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt><excerpt>"rand1"<<-

function(x, y)

{

# function to calculate the adjusted rand statistic

# developed by Douglas Luke, Saint Louis University, dluke@slu.edu

# based on Hubert & Arabie, 1985

# Eqn. 5 on p. 198

# x and y are vectors containing the two partitions to be compared

# first, get crosstabs

ctab <<- table(x,y)

# now calculate 4 intermediary sums

cellsum <<- sum(ctab*(ctab-1)/2)

totsum <<- sum(ctab)*(sum(ctab)-1)/2

# use matrix multiplication to get row and column marginal sums

rows <<- ctab %*% rep(1,ncol(ctab))

rowsum <<- sum(rows*(rows-1)/2)

cols <<- rep(1,nrow(ctab)) %*% ctab

colsum <<- sum(cols*(cols-1)/2)

# now put them together

adj.rand <<- (cellsum - (rowsum*colsum/totsum))/(.5*(rowsum +
colsum)-(rowsum*colsum/totsum))

}

</excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt></excerpt>

**** cut here ****

At 05:41 PM 6/4/98 +0000, you wrote:

>I am trying to run some simulations with various clustering methods. I
set up three groups,

>run the cluster methods, and save the cluster solution. How can I
compare these 2 vectors of

>solutions?

>1 3

>1 1

>1 3

>2 1

>2 1

>2 1

>3 2

>3 2

>3 2

>

>

>I thought about running 'unique' on each vector which would give me

>

>X = 1 2 3

>Y = 3 1 2

>

>but I don't know where to go from there. I could compare X[1] with Y[1]
and if they are

>the same set a counter to counter + 1 and get the proportion of
correctly identified

>solutions but I don't know how to impliment this in SPlus 4. Maybe
someone knows of a better

>solution. I do see problems with this method.

>

>Lance

>

>

>-----------------------------------------------------------------------

>This message was distributed by s-news@wubios.wustl.edu. To
unsubscribe

>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the

>message: unsubscribe s-news

>

--------------------------------------------------------------------

Douglas Luke dluke@slu.edu

Saint Louis University School of Public Health 314-977-8108 office

3663 Lindell Blvd. 314-977-8150 fax

St. Louis, MO 63108

--------------------------------------------------------------------
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news