[S] SUMMARY: reference category in factors (was contr.treatment

Victor Moreno (V.Moreno@ico.scs.es)
Wed, 27 May 1998 10:43:24 +0100


After some thought, this is a quick solution to changing the
reference category of a factor in a model. I have done minimal
testing and some bugs may remain, so please use with caution.

Thanks to Prof. Ripley, and S.D.Byers for answering previous query.

I have written four functions:

# contr.ref: calls contr.treatment with contrasts=F to generate all
# dummy categories and later eliminates the reference supplied by
# second argument "ref". Anyway, I don't think this can be used as
default contrasts option. It is just an auxiliary function.

contr.ref <-
function(x,ref=1)
{
if (ref>length(x)) {ref<-length(x);cat("\nref set to ",ref,"\n")} if
(ref<1) {ref<-1;cat("\nref set to ",ref,"\n")}
contr.treatment(x,F)[,-ref,drop=F] }

# factor.ref operates on a vector or a factor and uses second argument
# "ref" to assign a default reference category in models. It returns a
# factor with attributes contrasts based on contr.ref and a new
# attribute "ref" that keeps the position of the reference category.
# This is usefull for other functions to retrieve it if needed. Values
# of "ref" greater than the number of levels are correted.

factor.ref <-
function(var,ref=1)
{
var <- as.factor(var)
if ( ref >(l <-length(levels(var)) ) ) ref <- l
contrasts(var)<-contr.ref(1:l,ref)
attr(var,"ref")<-ref
var
}

# is.ref checks if attribute "ref" is present.

is.ref <-
function(x)
{
!is.null(attributes(x)$ref)
}

# ref returns reference category

ref <-
function(x)
{
attributes(x)$ref
}

# usage
#
# reference category can be assigned to a factor permanently for
# later use in formala
#
# x.ref <- factor.ref(x,2)
#
#
# it also can be used in model formula without permanent changes:
# examples:

summary(glm(x~ as.factor(y) ))$coe
Value Std. Error t value
(Intercept) 0.830508475 0.05428248 15.29975112
as.factor(y)2 -0.092190718 0.06761170 -1.36353201
as.factor(y)3 -0.071579903 0.06707316 -1.06719151
as.factor(y)4 -0.065356959 0.06529643 -1.00092703
as.factor(y)5 -0.056595431 0.06677062 -0.84760979
as.factor(y)6 0.004785643 0.07065314 0.06773433

summary(glm(x~ factor.ref(y,2) ))$coe
Value Std. Error t value
(Intercept) 0.73831776 0.04030824 18.3167954
factor.ref(y, 2)1 0.09219072 0.06761170 1.3635320
factor.ref(y, 2)3 0.02061081 0.05636466 0.3656691
factor.ref(y, 2)4 0.02683376 0.05423827 0.4947385
factor.ref(y, 2)5 0.03559529 0.05600430 0.6355813
factor.ref(y, 2)6 0.09697636 0.06058080 1.6007773

summary(glm(x~ factor.ref(y,3) ))$coe
Value Std. Error t value
(Intercept) 0.758928571 0.03939823 19.2630125
factor.ref(y, 3)1 0.071579903 0.06707316 1.0671915
factor.ref(y, 3)2 -0.020610814 0.05636466 -0.3656691
factor.ref(y, 3)4 0.006222944 0.05356544 0.1161746
factor.ref(y, 3)5 0.014984472 0.05535294 0.2707078
factor.ref(y, 3)6 0.076365546 0.05997916 1.2732014

Notice the end labels of the categories. The reference is missing.
With ref=1 the standard parameterization is used:

summary(glm(x~ factor.ref(y,1)) )$coe
Value Std. Error t value
(Intercept) 0.830508475 0.05428248 15.29975112
factor.ref(y, 1)2 -0.092190718 0.06761170 -1.36353201
factor.ref(y, 1)3 -0.071579903 0.06707316 -1.06719151
factor.ref(y, 1)4 -0.065356959 0.06529643 -1.00092703
factor.ref(y, 1)5 -0.056595431 0.06677062 -0.84760979
factor.ref(y, 1)6 0.004785643 0.07065314 0.06773433

Any improvements or corrections will be welcome.

=====================================================
Victor Moreno
SERC Tel: 34-93 260 78 12
Institut Catala d'Oncologia Fax: 34-93 260 77 87
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news