[S] Fixed functions for (pseudo-)R^2

Terry Elrod (Terry.Elrod@UAlberta.ca)
Fri, 27 Feb 1998 13:08:57 -0700

The earlier posting of functions for (pseudo-)R^2 did not return the correct value if an lm was estimated using glm.
These functions correct this. They are also more conservative in that they check for and require that glm or gam
objects use the canonical links either for gaussian, binomial or poisson. However, these links may have been passed
to gausi() in order to estimate overdispersed versions of these models.

Apologies for the error. Will be more careful next time.

T. E.
Prof. Terry Elrod; 3-23 Fac. of Business; U. of Alberta; Edmonton AB; Canada T6G 2R6
email: Terry.Elrod@Ualberta.ca; tel: (403) 492-5884; fax: (403) 492-3325
Web page: http://www.ualberta.ca/~telrod/

Here are some functions for calculating (pseudo-)R^2, which may be of use to some.

Rsquared <- function(object)
# object is an lm, glm or gam object
# Rsquared() is implemented only for glm or gam objects with
# one of the three link/variance combinations:
# (1) log and mu (which is canonical for Poisson),
# (2) logit and mu(1-mu) (which is canonical for binomial), or
# (3) identity and constant (which is canonical for gaussian).
# However these link/variance pairs may have been passed to quasi()
# to allow for overdispersion.

Rsquared.lm <-
R2 <- summary.lm(o)$r.squared
names(R2) <- "Rsquared"

Rsquared.glm <-
{ def <- family(o)$family[-1] # character vector of link and variance
typ <- matrix(c("Logit: log(mu/(1 - mu))", "Log: log(mu)", "Identity: mu",
"Binomial: mu(1-mu)", "Identity: mu", "Constant: 1" ), 2, 3, byrow=T) #
# typ is matrix of supported link and variance combinations
ml <- match(def[1],typ[1,], nomatch=-1)
mv <- pmatch(def[2],typ[2,], nomatch=-1)
if ( (ml != mv) || (ml <1 ) )
stop("Implemented only for canonical links for gaussian, binomial or poisson
(with optional provision for overdispersion using quasi)")
if (ml == 3)
return(Rsquared.lm(o)) # gaussian family
# Remainder of code is for binomial and poisson families (perhaps with provision for overdispersion)
n <- length(o$residuals)
# number of observations
R2 <- (1 - exp((o$deviance - o$null.deviance)/n))/(1 - exp( - o$null.deviance/n))
names(R2) <- "pseudo.Rsquared"

Rsquared.lm simply returns the ordinary R^2 for an lm object. However, using the R^2 formula for glm or gam objects (with discrete families) gives a value with a theoretical lower bound of zero but upper bound that is less than one. Pseudo-R^2 divides the ordinary R^2 statistic by its theoretical upper bound to yield a statistic with bounds [0,1].

The formulae may be used on lm, glm and gam objects. (gam objects will call Rsquared.glm.) The formula for pseudo-R^2 is taken from G. S. Maddalla, Limited-dependent and Qualitative Variables in Econometrics, Cambridge: Cambridge Univ. Press, 1983. page 40, equation 2.50.

These functions have been tested with lm and with glm families poisson, binomial and gaussian.
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news