Based on information from Brian Ripley, Bert Gunter pointed out that
the numeric codes associated with a factor are always assigned in
alphanumeric order, which can sometimes catch a user off-guard. As an
aside, I always use as.character() to convert a factor to a character
string. Using Bert's examples:
> x <- factor(c('a','b','b','a'))
> y <- factor(c('a','b','b','a'), levels = c('b', 'a'))
> x
[1] a b b a
> y
[1] a b b a
> levels(x)
[1] "a" "b"
> levels(y)
[1] "b" "a"
> codes(x)
[1] 1 2 2 1
> codes(y)
[1] 1 2 2 1
> as.character(x)
[1] "a" "b" "b" "a"
> as.character(y)
[1] "a" "b" "b" "a"
> all.equal(as.character(x), as.character(y))
[1] T
Sincerely,
--Steve M.
_____________
| *** | Steven P. Millard, Ph.D.
| * |
| * * * | P robability, TEL: 206-528-4877
| * * * | S tatistics & FAX: 206-528-4802
| * | I nformation E-mail: SMillard@ProbStatInfo.com
| * | WEB: www.ProbStatInfo.com
| *** | 7723 44th Avenue NE
|___________| Seattle, WA 98115-5117 USA
-----Original Message-----
From: Gunter, Bert [SMTP:bert_gunter@merck.com]
Sent: Tuesday, January 20, 1998 6:01 AM
To: 's-news'
Subject: An Splus factor/codes 'gotcha'
To all:
In response to a query I made last week, Brian Ripley informed me of an
Splus 'gotcha' with codes() that I think is worth passing on. My
apologies if it is old news. Here's the behavior.
# Create a factor:
> x_factor(c('a','b','b','a'))
> x
[1] a b b a
# coding of the factor
> codes(x)
[1] 1 2 2 1
#Suppose we need to unclass the factor to make it just a vector of
strings
# Try
> x.1_levels(x)[codes(x)]
> attributes(x.1)
list() # great!
> x.1
[1] "a" "b" "b" "a"
# This seems to work OK. However, even simpler is:
> levels(x)[x]
[1] "a" "b" "b" "a"
# Now suppose we do not use the default levels attribute sorted
alphanumerically
# But give it explicitly as in:
> y_factor(c('a','b','b','a'),levels=c('b','a')) # Note that the
levels
attribute is not in
# alphanumeric order. Of course,
> y
[1] a b b a
# How does the above procedure to change y into a plain text vector
work?
> y.1_levels(y)[codes(y)]
> y.1
[1] "b" "a" "a" "b" # It's backward!!
# The reason for this nasty behavior is the S-plus 'gotcha" that
codes()
assigns codes
# in alphanumeric order, so that 1 corresponds to 'a', 2 to 'b' etc..
This no longer gives the
# correct subscripting for the levels() vector when it is not sorted
alphanumerically.
# However, note that
> levels(y)[y]
[1] "a" "b" "b" "a" # still works
# as does the clumsier construction:
> sort(levels(y))[codes(y)]
[1] "a" "b" "b" "a"
# In short, codes only extracts the codes properly when the levels
vector is sorted!
> codes(y)
[1] 1 2 2 1
> print.default(y)
[1] 2 1 1 2
Cheers,
Bert Gunter
Biometrics Research
Merck Research Labs
P.O. Box 2000
Rahway, NJ 07065-0900
732-594-7765
"The business of the statistician is to catalyze the
scientific learning process." George E. P. Box