[S] S: Extended Cox Proportional Hazards Model

John Maindonald (john.maindonald@anu.edu.au)
Mon, 23 Nov 1998 13:51:59 +1100


Some time ago I put up a request for estimating a varying
proportional hazard in a Cox type model. The model takes
the form
h(t,x) = h0(t) exp(b1*x+b2*x*f2(t)+ ...)
Ken Hess supplied me with a function needed to set up the expanded
dataset needed to get the calculations started, and with a set of
notes on setting up the calculations. The function below is a
substantially modified version, which has not been extensively
tested. I am posting this at Frank Harrell's suggestion.
-----------------------------------------------------------------------

expand.breakpoints <-
function(dataset, index = "patnum", status = "status", tevent = "time",
Zvar = F, breakpoints = NULL)
{
# Expand <dataset> so that each individual has a row for each
# unique failure time that is <= his/her/its own failure time.
#
# Original version written by Kathy Russell, for
# Kenneth Hess, Department of Biomathematics,
# University of Texas M. D. Anderson Cancer Centre.
#
# Substantially modified by JHM, Nov 23, 1998.
#
# ERROR checking
onceonly <- paste("must be a character string matching exactly",
"one name in the first parameter, dataset")
# Require dataset to be of type data.frame
if((missing(dataset) || !is.data.frame(dataset)))
stop("\n Parameter, dataset, must be a data frame")
varset <- names(dataset)
covset <- varset
lvset <- 1:length(varset) # Require dataset to have unique names
if(any(duplicated(varset))) {
stop(paste("\n Parameter, dataset, must have uniquely defined",
"column names"))
}
# Require index to match exactly 1 dataset name
if(length((indexloc <- lvset[varset == index])) != 1)
stop(paste("\n Parameter, index,", onceonly))
covset <- covset[covset != index]
# Require status to match exactly 1 dataset name
if(length((statusloc <- lvset[varset == status])) != 1)
stop(paste("\n Parameter, status,", onceonly))
covset <- covset[covset != status]
# Require tevent to match exactly 1 dataset name
if(length((teventloc <- lvset[varset == tevent])) != 1)
stop(paste("\n Parameter, tevent,", onceonly))
covset <- covset[covset != tevent] # -----------------------------
# Form vector of breakpoints, if necessary
if(is.null(breakpoints)) {
times <- dataset[, tevent]
breakpoints <- sort(unique(times[dataset[, status] == 1]))
}
# Require breakpoints to be a vector of length >= 1
if((is.null(breakpoints)) || (!(is.numeric(
breakpoints))) || (!(is.vector(
breakpoints))) || (length(breakpoints) < 1)) stop(paste(
"\n Parameter, breakpoints, must be a numeric vector",
"with at least 1 element")) #*****************
#Begin
#*****************
n <- nrow(dataset)
temp.stop <- c(breakpoints, 0) # The 0 is a place-filler
if(breakpoints[1] > 0)
temp.start <- c(0, breakpoints)
else temp.start <- c(-1, breakpoints)
n.break <- length(breakpoints) + 1
t.event <- dataset[, tevent] # ---------------------------
## Begin calculate n.epochs
n.epochs <- sapply(1:n, function(m, breaks, times)
sum(breaks < times[m]) + 1, breaks = breakpoints, times = t.event)
# End n.epochs
id <- rep(1:n, n.epochs) # Index into supplied dataset
last.epoch <- cumsum(n.epochs)
epoch <- unlist(sapply(n.epochs, function(x)
1:x)) # Index into vectors of interval start & stop points
if(Zvar) {
Zmat <- diag(rep(1, n.break))[epoch, ]
Z.name <- paste("Z", seq(1, n.break), sep = "")
}
Tstop <- temp.stop[epoch]
Tstop[last.epoch] <- dataset[, tevent]
status2 <- rep(0, length(epoch))
status2[last.epoch] <- dataset[, status]
new.dataset <- data.frame(dataset[id, index, drop = F], temp.start[epoch],
Tstop, status2, epoch, dataset[id, covset, drop = F])
if(Zvar)
new.dataset <- data.frame(new.dataset, Zmat)
nam <- c(index, "Tstart", "Tstop", status, "epoch", covset)
if(Zvar)
nam <- c(nam, Z.name)
dimnames(new.dataset) <- list(1:length(epoch), nam)
return(new.dataset)
}

---------------------------------------------------------------------
# Example of Use of expand.breakpoints() to create a data frame]
# that can then be used to fit an Extended Cox Model:
# Uses S-PLUS built-in leukemia data frame. Note that this
# data set is too small to be realistic for this methodology.

# See S-PLUS4 Guide to Statistics, July 1997, pp.681-684
# and pp.691-692.

options(contrasts=c("contr.treatment","contr.poly"))
leukemia$patnum_1:23
attach(leukemia)
leuk.expand_expand.breakpoints(leukemia,
index="patnum", status="status", tevent="time", Zvar=F)
fit.ph <- coxph(Surv(Tstart,Tstop,status)~group,data=leuk.expand)
# May want summary(fit.ph) here
zph.ph <- cox.zph(fit.ph,'identity')
# Generates the 'identity' version of the Schoenfeld residuals
zph.ph # Invokes print.cox.zph(), which gives output from
# a test for violation of the proportional hazards
# assumption.
plot(zph.ph) # Invokes plot.cox.zph()

fit.lin.tci <- coxph(Surv(Tstart,Tstop,status)~group+group:Tstop,
data=leuk.expand)
# Attempt to fit a linear time-covariate interaction.
# The data are too limited to fit this here. The X matrix is
# deemed to be singular.

John Maindonald email : john.maindonald@anu.edu.au
Statistical Consulting Unit, phone : (6249)3998
c/o CMA, SMS, fax : (6249)5549
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news