Re: [S] glm: why aren't these nested?

Prof Brian Ripley (ripley@stats.ox.ac.uk)
Thu, 19 Mar 1998 13:50:08 +0000 (GMT)


David Maxwell SS CEFAS wrote:
>
> I have a numeric variable, dose.val, with 4 different values. I fitted two models using glm:
>
> 1.
> sp1.glm <- glm(y~log(x)+factor(dose.val), family=Gamma(link=log),data=sp.dat)
>
> 2.
> sp2.glm <- glm(y~log(x)+dose.val, family=Gamma(link=log),data=sp.dat)
>
>
> I intuitively expected model 2 to be nested in model 1, i.e. that the deviance due to factor(dose.val) could be split into the deviance due to linear dose.val and the remaining deviance.
>
> But fitting dose.val as a number gives a slightly lower residual deviance than when it is fitted as a factor, see the anovas below.
>
> Please can someone explain this to me or provide a suitable reference.
>
> Other info:
> 10 of the 73 values for y are zero
> SPlus v3.3 windows 3.11
>
>
> Thanks for your time,
> David
>
> d.l.maxwell@cefas.co.uk
>
>
> > anova(sp1.glm)
> Analysis of Deviance Table
> Gamma model
> Response: y
> Terms added sequentially (first to last)
> Df Deviance Resid. Df Resid. Dev
> NULL 72 137.4406
> log(x) 1 8.2889 71 129.1517
> factor(dose.val) 3 54.9171 68 74.2346
> >
> >
> > anova(sp2.glm)
> Analysis of Deviance Table
> Gamma model
> Response: y
> Terms added sequentially (first to last)
> Df Deviance Resid. Df Resid. Dev
> NULL 72 137.4406
> log(x) 1 8.28890 71 129.1517
> dose.val 1 55.10674 70 74.0449
>
>
> > summary(factor(sp.dat$dose.val))
> 0.0002 0.2402 0.5102 1.0202
> 18 19 22 14
> >
>

By looking at the data, I have got to the bottom of this.

Gamma(link=log)$deviance is

function(mu, y, w, residuals = F)
{
nz <- y > 0
devi <- (y - mu)/mu
devi[nz] <- devi[nz] - log(y[nz]/mu[nz])
if(residuals)
sign(y - mu) * sqrt(2 * abs(devi) * w)
else 2 * sum(w * devi)
}

That is clearly incorrect if there are zero values of y: why drop the
term in log mu? Thus the computed deviance goes up!

sp4.glm <- glm(y~log(x)+dose.val,
family=Gamma(link=log),data=sp.dat,maxit=20,epsilon=1e-8,trace=T)

sp3.glm <- glm(y~log(x)+factor(dose.val),
family=gm,data=sp.dat,maxit=20,epsilon=1e-8,trace=T, start=sp4.glm$lin)
GLM linear loop 1: deviance = 73.8242
GLM linear loop 2: deviance = 74.0432
GLM linear loop 3: deviance = 74.1846
GLM linear loop 4: deviance = 74.2103
GLM linear loop 5: deviance = 74.2305
GLM linear loop 6: deviance = 74.2335
GLM linear loop 7: deviance = 74.2365
GLM linear loop 8: deviance = 74.2368
GLM linear loop 9: deviance = 74.2372
GLM linear loop 10: deviance = 74.2372
GLM linear loop 11: deviance = 74.2373
GLM linear loop 12: deviance = 74.2373
GLM linear loop 13: deviance = 74.2373
GLM linear loop 14: deviance = 74.2373

Fixing the error by

"gm"<-
structure(.Data = list(family = structure(.Data = c("Gamma", "Log:
log(mu)",
"Square: mu^2"), .Names = c("name", "link", "variance")), names =
"Log: log(mu)", link = function(mu)
log(mu), inverse = function(eta)
care.exp(eta), deriv = function(mu)
1/mu, initialize = expression({
if(!is.null(dimy <- dim(y))) {
if(dimy[2] > 1)
stop("multiple responses not allowed")
else y <- drop(y)
}
else y <- as.numeric(y)
mu <- y + 0.167 * (y == 0)
}
), variance = function(mu)
mu^2, deviance = function(mu, y, w, residuals = F)
{
nz <- y > 0
devi <- (y - mu)/mu + log(mu)
devi[nz] <- devi[nz] - log(y[nz])
if(residuals)
sign(y - mu) * sqrt(2 * abs(devi) * w)
else 2 * sum(w * devi)
}
, weight = expression(w)), class = "family")

solves the problem.

> sp3.glm <- glm(y~log(x)+dose.val+factor(dose.val), family=gm,
data=sp.dat,maxit=20,epsilon=1e-8,trace=T)
GLM linear loop 1: deviance = 104.5529
GLM linear loop 2: deviance = 63.0715
GLM linear loop 3: deviance = 48.0214
GLM linear loop 4: deviance = 46.9947
GLM linear loop 5: deviance = 46.9834
GLM linear loop 6: deviance = 46.9826
GLM linear loop 7: deviance = 46.9824
GLM linear loop 8: deviance = 46.9824
GLM linear loop 9: deviance = 46.9824
GLM linear loop 10: deviance = 46.9824
> anova(sp3.glm)
Analysis of Deviance Table

Gamma model

Response: y

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 72 132.0892
log(x) 1 11.33682 71 120.7524
dose.val 1 72.65849 70 48.0939
factor(dose.val) 2 1.11145 68 46.9824

Yet another bug in the 1991 statistical models functions swatted!
Could MathSoft please take note?

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news