In response to a query I made last week, Brian Ripley informed me of an
Splus 'gotcha' with codes() that I think is worth passing on. My
apologies if it is old news. Here's the behavior.
# Create a factor:
> x_factor(c('a','b','b','a'))
> x
[1] a b b a
# coding of the factor
> codes(x)
[1] 1 2 2 1
#Suppose we need to unclass the factor to make it just a vector of
strings
# Try
> x.1_levels(x)[codes(x)]
> attributes(x.1)
list() # great!
> x.1
[1] "a" "b" "b" "a"
# This seems to work OK. However, even simpler is:
> levels(x)[x]
[1] "a" "b" "b" "a"
# Now suppose we do not use the default levels attribute sorted
alphanumerically
# But give it explicitly as in:
> y_factor(c('a','b','b','a'),levels=c('b','a')) # Note that the levels
attribute is not in
# alphanumeric order. Of course,
> y
[1] a b b a
# How does the above procedure to change y into a plain text vector
work?
> y.1_levels(y)[codes(y)]
> y.1
[1] "b" "a" "a" "b" # It's backward!!
# The reason for this nasty behavior is the S-plus 'gotcha" that codes()
assigns codes
# in alphanumeric order, so that 1 corresponds to 'a', 2 to 'b' etc..
This no longer gives the
# correct subscripting for the levels() vector when it is not sorted
alphanumerically.
# However, note that
> levels(y)[y]
[1] "a" "b" "b" "a" # still works
# as does the clumsier construction:
> sort(levels(y))[codes(y)]
[1] "a" "b" "b" "a"
# In short, codes only extracts the codes properly when the levels
vector is sorted!
> codes(y)
[1] 1 2 2 1
> print.default(y)
[1] 2 1 1 2
Cheers,
Bert Gunter
Biometrics Research
Merck Research Labs
P.O. Box 2000
Rahway, NJ 07065-0900
732-594-7765
"The business of the statistician is to catalyze the
scientific learning process." George E. P. Box