[S] Problem with predict()

Therneau, Terry M., Ph.D. (therneau@mayo.edu)
Sun, 24 May 1998 21:02:40 -0500


Thomas Lumley has kindly pointed out a bug in the "prediction" part of
survexp for coxph models, when there is a factor variable for which the
new data contains a subset of the old data's levels. I thought I'd make
it work correctly, i.e., like predict.lm. That was before I looked at
what predict.lm does.

To mimic the usual case with new data sets, I'll define 3 separate
data frames, i.e., not subscripted one from another.

> y <- 1:5
> x1<- c(6,4,9,5,4)
> x2<- c('a', 'b', 'c', 'b', 'c')

> test1 <- data.frame(y=y, x1=x1, x2=x2)
> test2 <- data.frame(y=y[2:5], x1=x1[2:5], x2=x2[2:5])
> test3 <- data.frame(y=y, x1=x1, x2=factor(x2, levels=c('c','b','a')))

> fit <- lm(y ~ x1 + x2, test1)
> fit$contrast
$x2:
b c
a 0 0
b 1 0
c 0 1

> predict(fit, newdata=test1)
1 2 3 4 5
1 3.153846 3.230769 2.846154 4.769231
> predict(fit, newdata=test2)
1 2 3 4
1.615385 1.615385 1.307692 3.153846
> predict(fit, newdata=test3)
1 2 3 4 5
4.153846 3.153846 0.07692308 2.846154 1.615385

----------------------------------

Why doesn't the prediction make use of the contrast matrix? The
contrasts matrix is referenced in the predict.lm call to model.matrix()
that builds a new X matrix, but it appears that the internal C
code pays NO attention to the row labels of the contrasts.
It makes the assumption that the new data frame has the same number
of levels, in the same order, for every factor in the original data frame.
It does read the columns, which only fixes the problem of a change
in the global 'contrasts' option between the time of the lm() and
predict() calls.

This is a serious bug.

Predict.coxph and survexp.coxph are slightly more serious -- I forgot
to pass along the contrasts matrix. In fixing this, I now have to decide
whether to propogate the global S mistake. I favor not, but this will make
coxph not consistent with the rest of the package.

Opinions?

Terry Therneau (author of the survival routines)

PS: The above was run with options(contrasts='contr.treatment').

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news