Re: [S] AIC for survreg models

Prof Brian Ripley (ripley@stats.ox.ac.uk)
Mon, 16 Feb 1998 13:23:11 GMT


Paul Monaghan <pfm@scmp.scm.liv.ac.uk> wrote (after re-formatting):

> Thanks to Terry Therneau, Bill Venables and Brian Ripley who responded
> to my earlier query about 'residual deviance' for parametric survival
> modelling using the survreg function. I should have stated the version
> of Splus I use, as pointed out by Prof. Ripley, this is 3.3 for Unix.
>
> My main reason for considering deviance and log-likelihoods is an
> attempt to compare non-nested parametric models using the Akaike
> Information Criteria. For example, I wish to compare fits using the
> log-logistic, log-normal and Weibull models (the test of the
> exponential model versus the Weibull is equivalent to the hypothesis
> test that the shape parameter is 1, or in the survreg fit that
> Log(scale) is zero). The usual situation has AIC defined as
>
> -2 log L + 2p *
>
> where log L is the likelihood under the fitted model and p is the
> number of parameters in the model. GIven the replies to my query, I am
> concerned about how this may apply to survreg fits. As the Weibull,
> log-logistic and log-normal models all have 2 baseline parameters
> (shape and scale) using this definition and assuming that the same
> covariates are included in each model the 'best' fitting model using
> this criteria is that with the largest maximised log-likelihood.
>
> My question is that since the 'residual deviance' depends on the
> estimate of the scale parameter, does AIC hold for the comparison of
> survival models? Or is some correction to * based on the estimated
> scale parameter required.
>

At least in 3.4 it does work, and we have used it for that purpose (as
well as for comparisons with non-linear models fitted by other code).
I'm pretty sure 3.3 also works. My stepAIC function uses

extractAIC.survreg <- function(fit, ...)
{
n <- length(fit$residuals)
edf <- n - fit$df.residual
c(edf, -2 * fit$loglik[2] + 2 * edf)
}

Note that the loglikelihood for the fitted model is the _second_ component
of the loglik item (despite ?survreg.object). If you want to compare
different families you need to work out how many parameters are
fitted. I think sum(!fit$fixed) will do this for you, so in that case
you need

extractAIC.survreg <- function(fit, ...)
{
n <- length(fit$residuals)
edf <- n - fit$df.residual + sum(!fit$fixed)
c(edf, -2 * fit$loglik[2] + 2 * edf)
}

Whether AIC is a reasonable method for these models is another,
theoretical, question! I tend to use cross-validation on a measure
of fit I am really interested in (e.g. prob of survival for 5 years).

Brian

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news