[S] Splus5.0 vs 3.4 (long)

Therneau, Terry M., Ph.D. (therneau@mayo.edu)
Mon, 2 Nov 1998 21:45:22 -0600

Richard Whitney asked about porting to Splus5.
Here were some issues that I encountered in converting the survival suite
to Splus 5.0. (This code is at level approx 4.9, with version 4 posted to
statlib and version 5 to be posted real soon now. The Splus code is at
approx 4.8). Daily usage issues will be left for another day.

All of the source code is managed in SCCS, so I completely ignored the
convertOldLibrary utility. I have been working on parts of this code
since 1986, and so am very unlikely to change horses as far as source code
systems. Currently there are 14573 lines of S and C code.

A. Systematic changes
1. I had no calls to "log(x, base=something)" with an explicit base
argument, so no log() calls had to change to logb(), as is done by

2. Change
class(x) <- "coxph" (or 'survfit' or 'survreg' or ....)
and attr(x, 'class')<- 'coxph'

to OldClass(x) <- "coxph"

3. For methods with inheritance such as coxph.null (a Cox model with
only an offset term), change
class(x) <- c('coxph.null', 'coxph')
to Oldclass(x) <- 'coxph.null'
and add the line
setOldClass(c('coxph.null', 'coxph'))
to the top of the source file containing the function coxph.fit(), i.e., the
one that used to set a multiple class. This sets the class when the function
is sourced into S.

4. I had several C routines that called S_alloc. Those routine had to
have the line
added as the first executable line of the subroutine, and the call to
S_alloc needs another (final) argument
The include file S.h will be needed in these files.

Problem: on a Sun system, S.h includes other .h files, one of which
defines the variable "time". Surprise surprise, that's a variable name
I'm fond of in several of my (survival) routines. I had to re-name the
variable. On Linux this wasn't a problem.

5. The prototype for [ changed, so I had to change the first line of
"[.Surv" for instance from
"[.Surv" <- function(x, i, j, ...) {
"[.Surv" <- function(x, ...) {
and refer to "..2" instead of "j" in the body of the function.


B. Systematic bugs (mine)
1. Splus5 is much better about retaining integers as integers, older
versions would convert them to doubles at the drop of a hat. Some of
my .C calls were lacking an "as.double(wt)" for certain arguments. In Splus3.4
the variable "wt" was always double by the that time, in 5.0 not so.
Add the "as.double", that should have been there anyway, or use the
setInterface routine. I must have had a dozen of these, case weights being
the most prevalent.
In the same vein, I had a few instances of
matrix(0, n1, n2),
inside a .C call. Splus3.2 generated a double, 5.0 an integer until I
changed the "0" to "0.0". Ditto for rep(0,n).

2. I had to change one "missing(y)" to "is.null(y)" in predict.coxph.
My original code was technically wrong but worked in the old.

3. In one test suite program I had a logical factor "transplant" (the
famous Stanford heart transplant data), and a test
if (transplant==1)
It should have been
if (transplant==T)
It worked in the old, not in the new.


C. Bugs & problems (theirs)

1. Loading has changed, with some good and some bad about it. The call
now will both attach the directory "dir/.Data" and load the object file
"dir/S.so". I like this.
However, if the .o file contains routine names that are already a part
of Splus and are called internally, you will still be calling the old routines
and not the new, without warning! For instance, survreg.fit() calls the
C routine "survreg.o" via a .C interface, and "survreg.o" calls "rnewton.o".
When I loaded an S.so that contains both survreg and rnewton, the first is
replaced but not the second (both are part of standard Splus5).
Moral: use dyn.exists() to see if new routines you want to add are already

2. More importantly, if you are working on and changing a C routine, it
seems that
exit S
remake S.so
restart S
is the only safe way to ensure a new copy of everything; no more repeated
calls to dyn.load from a single session.

3. The construct
xx <- 1:nvar
for (i in xx) xx[i] <- max(x[,i])
blows up S. In the new code you cannot modify xx while it is
"still being used" in the for expression. This caught one routine.

4. Creation of a factor variable "on my own" no longer worked
levels(levs) <- labs
class(levs) <- "factor"
changed to
factor(levs, labels=labs)
In general, it wasn't a good idea to assume I knew the structure of an object.

5. There have been several transient bugs. Statsci is very responsive, but
I'm sure there are still bugs lurking about. Examples have been
approx() with the f= argument failed
attr(Surv, 'type') <- NULL gave a memory fault error

Terry T.

Terry M. Therneau, Ph.D. (507) 284-3694
Head, Section of Biostatistics (507) 284-9542 FAX
Mayo Clinic therneau.terry@mayo.edu
Rochester, Minn 55905
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news