RE: An Splus factor/codes 'gotcha'

Steven Paul Millard (probstat@nwrain.com)
Wed, 21 Jan 1998 06:54:27 -0800


Hello,

Based on information from Brian Ripley, Bert Gunter pointed out that
the numeric codes associated with a factor are always assigned in
alphanumeric order, which can sometimes catch a user off-guard. As an
aside, I always use as.character() to convert a factor to a character
string. Using Bert's examples:

> x <- factor(c('a','b','b','a'))

> y <- factor(c('a','b','b','a'), levels = c('b', 'a'))

> x
[1] a b b a

> y
[1] a b b a

> levels(x)
[1] "a" "b"

> levels(y)
[1] "b" "a"

> codes(x)
[1] 1 2 2 1

> codes(y)
[1] 1 2 2 1

> as.character(x)
[1] "a" "b" "b" "a"

> as.character(y)
[1] "a" "b" "b" "a"

> all.equal(as.character(x), as.character(y))
[1] T

Sincerely,

--Steve M.

_____________
| *** | Steven P. Millard, Ph.D.
| * |
| * * * | P robability, TEL: 206-528-4877
| * * * | S tatistics & FAX: 206-528-4802
| * | I nformation E-mail: SMillard@ProbStatInfo.com
| * | WEB: www.ProbStatInfo.com
| *** | 7723 44th Avenue NE
|___________| Seattle, WA 98115-5117 USA

-----Original Message-----
From: Gunter, Bert [SMTP:bert_gunter@merck.com]
Sent: Tuesday, January 20, 1998 6:01 AM
To: 's-news'
Subject: An Splus factor/codes 'gotcha'

To all:

In response to a query I made last week, Brian Ripley informed me of an
Splus 'gotcha' with codes() that I think is worth passing on. My
apologies if it is old news. Here's the behavior.

# Create a factor:
> x_factor(c('a','b','b','a'))
> x
[1] a b b a
# coding of the factor
> codes(x)
[1] 1 2 2 1

#Suppose we need to unclass the factor to make it just a vector of
strings
# Try
> x.1_levels(x)[codes(x)]
> attributes(x.1)
list() # great!
> x.1
[1] "a" "b" "b" "a"

# This seems to work OK. However, even simpler is:
> levels(x)[x]
[1] "a" "b" "b" "a"

# Now suppose we do not use the default levels attribute sorted
alphanumerically
# But give it explicitly as in:

> y_factor(c('a','b','b','a'),levels=c('b','a')) # Note that the
levels
attribute is not in
# alphanumeric order. Of course,
> y
[1] a b b a

# How does the above procedure to change y into a plain text vector
work?

> y.1_levels(y)[codes(y)]
> y.1
[1] "b" "a" "a" "b" # It's backward!!

# The reason for this nasty behavior is the S-plus 'gotcha" that
codes()
assigns codes
# in alphanumeric order, so that 1 corresponds to 'a', 2 to 'b' etc..
This no longer gives the
# correct subscripting for the levels() vector when it is not sorted
alphanumerically.
# However, note that

> levels(y)[y]
[1] "a" "b" "b" "a" # still works

# as does the clumsier construction:

> sort(levels(y))[codes(y)]
[1] "a" "b" "b" "a"

# In short, codes only extracts the codes properly when the levels
vector is sorted!

> codes(y)
[1] 1 2 2 1
> print.default(y)
[1] 2 1 1 2

Cheers,

Bert Gunter
Biometrics Research
Merck Research Labs
P.O. Box 2000
Rahway, NJ 07065-0900
732-594-7765

"The business of the statistician is to catalyze the
scientific learning process." George E. P. Box