# [S] Residual Deviance and log-likelihood in survreg

Therneau, Terry M., Ph.D. (therneau@mayo.edu)
Fri, 13 Feb 1998 08:17:57 -0600

Short answer: This is explained in the technical report "A Package for
Survival in S", which is available from statlib in survival4.doc.

Longer answer: It's a design flaw. At the time of writing, it seemed
desirable to make survreg inherit from the glm class, since it can be
viewed as an extension of glm models to censored data. Since glm is
organized around reporting the "deviance", survreg was too.
But there is a problem. The deviance is defined as 2*[loglik(fitted model)
- loglik(saturated model)] *scale, where the saturated model has one coef per
subject. For a Gaussian linear model with known variance sigma^2 this turns
out to be
[ ( -.5 log(2*pi) - log(sigma) - .5 (y_i - yhat_i)^2/sigma^2 )
- ( 0.5 log(2*pi) - log(sigma) - .5 (y_i - y_i )^2/sigma^2 ) ] * sigma^2

which is independent of sigma^2. The same algebra works out for binomial,
poisson, .... all the glm models. Two nice properties are that differences
in deviances are the same as 2* differences in loglik, so the usual
chisquare tests apply, and that for a good fit we have, roughly, that
E(residual deviance) = residual df.

In censored data this doesn't work out -- the "nuisance" parts of the
loglik don't neatly cancel, and the scale parameter is integral. Survreg
does the following:
fit the data with covariates, estimating both coefs and scale
fit the saturated model, with scale fixed at the prior level
define deviance as the 2*(difference between them)

The problem is that if you first fit a model with 3 covariates, then one
with 4 covariates, the difference in printed deviances is NOT the proper
test for the 4 variable vs 3 variable model. In Splus version 3.4, Statsci
added an anova.survreg method which does the right thing by comparing logliks
rather than deviances. They also removed the worst behavior of print.survreg.

The solution? One of these days, perhaps soon, the survreg code will
undergo a not-upward-compatable change. Hindsight beeing as good as it
is, I now realize that inheritance from glm was an incorrect design decision.
I don't like making a change that will break old code, but in this case
the printout has too many features that can mislead --- print.glm(x) where
x is a survreg object is particularly bad. The timing of the change is
driven by two things; I've known that I need to do this for some while but
can never seem to find the time, and Statsci has some important enhancements
that are ready to be added with customers who want them. Whether we should
depreciate survreg creating a new function, replace it completely, create
a survreg.old, .... is being worked out.

Terry Therneau
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news