[S] summary: predict.gam() with newdata=

John Thaden (jjthaden@life.uams.edu)
Tue, 08 Sep 1998 01:52:46 -0500

In May, I wrote both S-News and MathSoft customer support because
predict.gam() often gave me arcane error messages, such as . . .
Error in lm.fit.qr(x, y, singular.ok = ..1): Number of
observations in x and y not equal
or . . .
Error in safe.predict.gam(object, newdata, type,..:
Length of variable 1 is 1 != length of row names (40)
or . . .
Warning in x * w.factor: Length of longer object is
not a multiple of the length of the shorter object
Error in names<-: Invalid length for names attribute:
y = structure(..

Thanks to Brian Ripley, Rachel Fewster, Bill Venables, StatSci's Julie
Dryden, and an unnamed StatSci/MathSoft statistician, I've gained insight
into predict.gam() that I'll share:

A command

>predict.gam(object, newdata)

results in a call to safe.predict.gam(), which among other things uses
rbind.data.frame() to make a new data frame that includes both original
data and newdata (this is integral to the `safe' method of predict.gam). I
think most problems are problems with safe.predict.gam getting the original
data. So far, I know of two such problems.

The first is violation of scoping rules. Reviewing these . . .

> When you try to reference a variable in a function, S looks first
> for the variable
> (1) in that function
> (2) NOT in any function that called that function
> (3) on frame 1
> (4) on frame 0
> (5) on permanent data bases.

So if the old data was created in a function you write that also calls
predict.gam(), then safe.predict.gam() won't see it. The solution is to
assign the old data to frame 1. A special case of this is for() loops, so
from Bill Venables via the archives:

>tmp <- -20:20
>old.data <- data.frame(x = tmp, y = tmp + tmp^2 + rnorm(tmp))
>new.x <- data.frame(x = 21)
>for(i in 1:3) {
>+ ff <- lm(y ~ poly(x, i), old.data)
>+ print(predict.gam(ff, new.x))
Error in safe.predict.gam(object, newdata,..: Length of variable
2 is 1 != length of row names (41)

His solution:
> for(i in 1:3) {
>+ form <- substitute(y ~ poly(x, .i), list(.i = i))
>+ ff <- lm(form, old.data)
>+ print(predict.gam(ff, new.x))
>+ }
>[1] 161.64
>[1] 462.64
>[1] 463.56

The second problem occurs when the gam(), glm(), or lm() command that
created object referred to in predict.gam(object, newdata) somehow changes
the dimensions of the original data. The case I am certain of is using
na.action=na.omit. The solution is to groom your original data before
creating the gam object:

>original.data <- na.omit(original.data)

I suspect that use of the `subscript=` argument, or using subscripting
when specifying the data= argument in gam(), glm(), or lm() will also cause
the problem.

There may be other conditions that cause predict.gam to dump, but scoping
problems and data subscripting seem to be two of them.

John Thaden, Ph.D., Instructor jjthaden@life.uams.edu
Department of Geriatrics (501) 661-1202 x 2986
University of Arkansas for Medical Sciences FAX: (501) 671-2510

mail & ship to: J. L. McClellan V.A. Medical Center
Research-151 (Room GB103, GC124)
4300 West 7th Street
Little Rock AR 72205 USA
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news