Re: counting duplicates

Douglas Bates (bates@stat.wisc.edu)
15 Jan 1998 09:29:36 -0600


Bruce McCullough <BMCCULLO@fcc.gov> writes:

> I have a character vector of 9000 names.

> Some names appear more than once, as
> many as thirty times.

In that case, you should probably convert the character vector to a
factor. It is easier to manipulate like that and probably will occupy
less storage.

So start with your character vector, called Names and form
Names <- as.factor(Names)

> I wish to create an associated numeric vector, also
> of length 9000 which indicates, for each name, the
> number of times the name appears in the list.

> If "John Smith" appears 30 times,
> then for each occurrence of "John Smith"
> the associated numeric vector should
> assume the value 30.

Use table() to get the counts.
Counts <- table(Names)
Then use match() to find the positions
Ind <- match(Names, names(Counts))
and set up your vector
NameCount <- Counts[Ind]

S> Names <- sample(LETTERS, 1000, replace = T)
S> Names[ 1:10 ]
[1] "C" "E" "T" "V" "E" "S" "U" "T" "O" "S"
S> Names <- as.factor(Names)
S> Counts <- table(Names)
S> Counts
A B C D E F G H I J K L M N O P Q R S T U
43 39 43 22 53 37 31 38 46 31 43 36 40 31 35 37 43 44 33 52 41
V W X Y Z
42 36 38 37 29
S> NameCount <- Counts[ match( Names, names(Counts) ) ]
S> NameCount[ 1:10 ]
C E T V E S U T O S
43 53 52 42 53 33 41 52 35 33

-- 
Douglas Bates                            bates@stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/