[S] Idiosyncrasies of tree(), cv.tree()

Hong Ooi (hong@zip.com.au)
Thu, 24 Sep 1998 14:36:30 GMT


Hello all,

I seem to have discovered a quirk in the tree-fitting functions in
Splus. Specifically, cv.tree() fails if the original argument to the
tree object passed to it was a model frame, rather than a formula and
dataset. (I'm using Splus 4 release 1, on NT 4, with the MASS and
treefix libraries.)

To elaborate: suppose I fit a tree, and then try cross-validating it:

>obj1.tree <- tree(y~x, data=foo)
>obj1.cv <- cv.tree(obj1.tree)

This works fine, but now if I try

>obj2 <- tree(y~x, data=foo, method="model.frame") # obj2 is a model frame
>obj2.tree <- tree(model=obj2) # produces the same model as above
>obj2.tree.cv <- cv.tree(obj2.tree)

This fails with the usual cryptic S error messages.

Digging around a bit further, I found the reason for this: cv.tree first
extracts the model frame from obj2.tree, using model.frame.tree().
model.frame.tree in turn calls tree() with the method argument appended
to the original call that created the tree object -- ie, the call is

tree(model=obj2, method="model.frame")

What should happen here is that the function should just return the same
model frame it's passed. However, it appears that tree() contains a
quirk: if it's passed a model argument, it ignores the method -- that
is, it fits the model again, and returns a tree object. This is happily
passed to model.frame.tree, which passes it on to cv.tree, which then
tries subscripting it, with messy results. The offending code in tree
looks like this:

if(is.null(model)) {
...
... do some stuff ...
...
if(method == "model.frame") return(model)
}

so if model is not NULL, method is never checked, and the function goes
on its merry way. The one-liner solution would be to move the if(method)
check outside the if(is.null) block, so that the model frame is always
returned. A better solution might be to have model.frame.tree check its
arguments so that tree isn't called unnecessarily.

The situation where a function is expected to return the same argument
it's passed is probably pathological, but it appears to have happened in
this case.

Has anyone else experienced this behaviour?

(Before anyone asks, I'm using the model=foo approach because I'm
writing a function that in turn calls various fitting procedures --
tree() being one of them. My function handles the task of building the
model frame, before calling the fitting procedure. Since it's already
doing that, it would be a duplication of effort to pass the formula on
to tree and have it rebuild the model frame.)

-- 
Hong Ooi            | NRMA Research and Development
hong@zip.com.au     | Ph:  02-9292-1532
Sydney, Australia   | Fax: 02-9292-1509
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news