An Splus factor/codes 'gotcha'

Gunter, Bert (bert_gunter@merck.com)
Tue, 20 Jan 1998 09:01:27 -0500


To all:

In response to a query I made last week, Brian Ripley informed me of an
Splus 'gotcha' with codes() that I think is worth passing on. My
apologies if it is old news. Here's the behavior.

# Create a factor:
> x_factor(c('a','b','b','a'))
> x
[1] a b b a
# coding of the factor
> codes(x)
[1] 1 2 2 1

#Suppose we need to unclass the factor to make it just a vector of
strings
# Try
> x.1_levels(x)[codes(x)]
> attributes(x.1)
list() # great!
> x.1
[1] "a" "b" "b" "a"

# This seems to work OK. However, even simpler is:
> levels(x)[x]
[1] "a" "b" "b" "a"

# Now suppose we do not use the default levels attribute sorted
alphanumerically
# But give it explicitly as in:

> y_factor(c('a','b','b','a'),levels=c('b','a')) # Note that the levels
attribute is not in
# alphanumeric order. Of course,
> y
[1] a b b a

# How does the above procedure to change y into a plain text vector
work?

> y.1_levels(y)[codes(y)]
> y.1
[1] "b" "a" "a" "b" # It's backward!!

# The reason for this nasty behavior is the S-plus 'gotcha" that codes()
assigns codes
# in alphanumeric order, so that 1 corresponds to 'a', 2 to 'b' etc..
This no longer gives the
# correct subscripting for the levels() vector when it is not sorted
alphanumerically.
# However, note that

> levels(y)[y]
[1] "a" "b" "b" "a" # still works

# as does the clumsier construction:

> sort(levels(y))[codes(y)]
[1] "a" "b" "b" "a"

# In short, codes only extracts the codes properly when the levels
vector is sorted!

> codes(y)
[1] 1 2 2 1
> print.default(y)
[1] 2 1 1 2

Cheers,

Bert Gunter
Biometrics Research
Merck Research Labs
P.O. Box 2000
Rahway, NJ 07065-0900
732-594-7765

"The business of the statistician is to catalyze the
scientific learning process." George E. P. Box