[S] Re: Adjusted variable plots [Summary](Function)

John Thaden (jjthaden@life.uams.edu)
Mon, 16 Mar 1998 17:30:12 -0600


On March 6, 1998, I wrote:
> I want to look for high-influence observations in my dataset. The Cook's
distance
> is useful, but I'm interested in looking at influentials separately for
each coefficient
> of a multivariate regression. I've used adjusted variable plots (I think
they are also
> called partial-regression plots) in the past to find outliers and
leverage points. So
> for the k-th coefficient Beta sub-k, I would plot residuals of the
regression of outcome
> Y on all predictors X except the k-th, versus residuals of the regression
of the k-th X
> on all X's except the k-th. <snip>

The responses were thought-provoking indeed. Bill Venables contributed an
elegant single-liner

> plot(resid(update(lm.obj,cbind(-Disp.,Fuel)~.-Disp.)))

which works one-by-one, only for predictors that result in just one
coefficient (not poly(x, 2), for instance). Steve Millard suggested

> plot.gam(lm.obj, resid = T, rug = F, scale = 2)

for partial residual plots. These seem to not be adjusted variable plots,
however, since the least squares fit through the points do not give a slope
with the same value as the original coefficient in lm.obj. He also showed
a nice way to get Cook's Distance plots for terms. Another solution for
partial resiual plots was contributed by Claus Dethlefson.

Jim Robison-Cox adapted an existing program for added variable plots to
this purpose. There were a few problems, but it finally worked nicely, but
only for explicit models with all simple predictors.

My interest was to easily get plots for all terms, regardless of whether
they were multilevel factors, or polynomial terms, etc. Since nothing
surfaced, I wrote the function adjvarplot() [see below]. Also attached
below is a help file for it. This function works in S-PLUS 4.0, but will
need to have four lines modified for other versions.

Best regards,

John Thaden, Ph.D., Biochemist
Department of Geriatrics
University of Arkansas for Medical Sciences
Little Rock Arkansas USA

#########################################################

adjvarplot_function(lm.obj, do.factors = T, diagnostics = T, smoo = T, ...)
{
## Authored March 16, 1998 by John Thaden, University of Arkansas
# for Medical Sciences, Little Rock, AR, based on an algorithm of
# William Venables contributed to S-News. Thanks also to Jim
# Robison-Cox and Jens Oehlschlaegel.
## Generates adjusted variable plots with added least-squared
# lines and, if smoo = T, a smoothed curve. If diagnostics = T,
# then diagnostic plots in the style of plot.lm() and a summary are
# produced for the simple regression.
## Adjusted Variables: If the model of lm.obj is Y~Xi, i=1,2,...m,
# then the adjusted variable plot of the k-th X term is the plot of
# the residuals of lm(Y~X[,-k]) on the y-axis, versus on the x-axis
# the residuals of lm(X[,k]~X[,-k]). The regression line on this
# plot has slope equal to the k-th coefficient of the lm.obj.
## Reference: Chambers, Cleveland, Kleiner & Tukey (1983)
# Graphical Methods for Data Analysis (Boston: Duxbury Press).
#
# Check arguments.
if(!is.logical(diagnostics)) stop("Argument diagnostics= must be T or F.")
if(!is.logical(smoo)) stop("Argument smoo= must be T or F.")
# Note value of "subset=" if it exists, for inclusion under plot.
sst <- ifelse (is.null(sst <- lm.obj$call$subset), "",
paste("Subset =", paste(sst[c(2,1,3)], collapse = ""))) #
if(!is.logical(do.factors)) stop("Argument do.factors= must be T or F.")
# Ensure that x model.matrix and y vector exist. Use to make plot axis
labels.
lm.obj2 <- update(lm.obj, ~., x=T, y=T)
Xlabels <- paste("adjusted", dimnames(lm.obj2$x)[[2]])
Ylabel <- paste("adjusted", Ynam <- lm.obj$call$formula[[2]]) #
# Make expanded data.frame with columns for Y, and all coefs but (Intercept).
updata <- data.frame(lm.obj2$y, cbind(lm.obj2$x[ , -1]), check.names=T)
names(updata)[1] <- Ynam
M <- length( nams <- names(updata) )
N <- length( rows <- row.names(updata) ) #
# Generate an expanded formula from the data.frame
formla <- formula.data.frame(updata) #
# Note X columns that are not dummy variables. (Factors are now only 0's
and 1's.)
dumfunc <- function(x) return(any(is.na(match(x, c(0,1)))))
nodummy <- unlist(lapply(updata[,-1],dumfunc))
# Add 'weights" column to the data.frame if lm.obj regression was weighted.
if(!is.null(lm.obj2$weights)) {
updata <- data.frame(updata, lm.obj2$weights)
w.indx <- -(dim(updata)[[2]])
w <- names(updata)[-w.indx] <- lm.obj$call$weights
wcomment <- paste("Weighted by", w)
}
else {
w.indx <- 0
wcomment <- {}
} #
# Use the expanded data.frame and formula to update lm.obj, no subsetting.
lm.obj3 <- update(lm.obj2, formula = formla, data = updata,
y = F, subset = ) #
# Loop to generate adjusted variable plots and adjunct plots. Summary:
# 1. If current X is a dummy variable, note which observations are 1.
# 2. Create a formula with two response variables (Y and the current X)
# and all predictors except the current X.
# 3. Get adjusted variables (residuals of update() using the new formula).
# 4. Note names of six observations with most extreme adjusted x values.
# 5. Open and name an S-PLUS 4.0 graphsheet NOTE: THE FOUR LINES STARTING
# WITH 'g' MUST BE MODIFIED FOR USE WITH OTHER S-PLUS VERSIONS.
# 6. Make Adjusted Variable Plot, with weights and subset info in subtitle.
# 7. Add points (closed/open dots). Annotate 6 extrema. Draw smooth curve.
# 9. Do simple regression of adjusted Y on adjusted X and draw abline.
# 10. Use plot.lm to generate diagnostic plots for the simple regression.
# 11. Save and close the multi-paged graphsheet. SEE NOTE ABOVE.
#
Xselect <- ifelse(do.factors, 2:M, (2:M)[nodummy] )
for(i in Xselect) {
if(do.factors && !nodummy[i-1])
dum1 <- as.logical(lm.obj3$x[i])
else dum1 <- logical(N)
formla <- structure(
parse(text =
paste(
paste("cbind(", nams[i], ",", nams[1], ")", sep = ""),
paste(nams[c(-1, -i, w.indx)], collapse = "+"),
sep = "~"
)
)[[1]],
class = "formula"
)
adjstdXY <- resid(update(lm.obj3, formula = formla, x=F))
dimnames(adjstdXY) <- list(rows, c(Xlabels[i], Ylabel))
adjX3rema <- order(adjstdXY[,1])[c(1, 2, 3, N-2, N-1, N)]
graph <- paste("var", ifelse(i < 11, "0", ""), i-1, sep = "")
graphsheet(pages="every graph", Name = graph)
plot(adjstdXY, type = "n", sub = paste(wcomment, sst, sep = " "))
points(adjstdXY[!dum1, 1], adjstdXY[!dum1, 2], pch = 1)
points(adjstdXY[ dum1, 1], adjstdXY[ dum1, 2], pch = 16)
text(adjstdXY[adjX3rema, 1], adjstdXY[adjX3rema, 2], labels =
paste(c(rep("", 3), rep(" ", 3)), rows[adjX3rema],
c(rep(" ", 3), rep("", 3)), sep = ""))
if(nodummy[i-1] && smoo)
lines(lowess(adjstdXY[ , 1], adjstdXY[ , 2]), lty = 2)
YX <- data.frame(adjstdXY[,2:1])
formla <- formula.data.frame(YX)
if (w.indx != 0) YX <- data.frame(YX, updata[-w.indx])
lm.obj4 <- update(lm.obj3, formula = formla, data = YX, class = "lm")
abline(lm.obj4)
if(diagnostics == T)
plot.lm(lm.obj4, smooth = nodummy[i-1] && smoo, id.n = 6)
guiSave( "GraphSheet", Name = graph, NewName = graph,
FileName = paste("C:\\Program Files\\splus4\\proj\\",
graph, ".sgr", sep = ""))
guiClose("GraphSheet", graph)
}
}

#################################################################


Adjusted variable plots for variables in a regression model

DESCRIPTION:

Produces adjusted variable plots with least-squares fits for
variables associated with the coefficients in the original
regression model.

USAGE:
adjvarplot(lm.obj, do.factors=T, diagnostics=T, smoo=T, ...)

REQUIRED ARGUMENTS:

lm.obj
An lm object.

OPTIONAL ARGUMENTS:

do.factors
If TRUE, include graphs for variables that are factors, or
contain only 0's and 1's

diagnostics
If TRUE, include for the adjusted variable least-squares fit
various diagnostic plots, in the style of plot.lm()

smoo
If TRUE, include smooth curves on plots.

VALUE:
No value is returned.

SIDE EFFECTS:
Graphsheets are opened, plots are drawn, and the graphseets
are saved in the S_PROJ directory. One graphsheet is produced
per coefficient in lm.obj.

DETAILS:
Adjusted Variables: If the model of lm.obj is Y~Xi, (i=1,2,...m),
with each Xi being 'simple' in the sense of producing one coefficient,
then the adjusted variable plot of the k-th X term is the plot with
the residuals of lm(Y~X[,-k]) on the y-axis and the residuals of
lm(X[,k]~X[,-k]) on the x-axis. A least-squares line on this plot
has slope exactly equal to the k-th coefficient of the lm.obj.
adjvarplot() operates on 'simple' terms on the right side
of the lm.obj formula, i.e. those resulting in only one coefficient,
and the simple terms derived, e.g. by model.matrix, from each term
that produces more than one coefficient, such as poly(x, 2) or TREATMENT
if TREATMENT is a factor with more than two levels. In other words, an
adjusted variable plot is produced for each coefficient in the lm object,
except for the intercept. Because the function uses update() throughout,
arguments of lm.obj such as weights= , na.action= , etc. are carried
through to the plots.

REFERENCES:
Authored March 16, 1998 by John Thaden, University of Arkansas for Medical
Sciences, Little Rock, Arkansas, based on an update() usage suggested by
William Venables, and upon other ideas by Jim Robison-Cox, Jens
Oehlschlaegel,
Stephen Millard, and others on the S-News mailing list. v. 1.00.

Chambers, Cleveland, Kleiner & Tukey (1983) Graphical Methods for Data
Analysis (Boston: Duxbury Press).
WARNING:
This function was written mainly for myself and has not been tested under
many conditions, e.g. with objects that inherit from lm objects.

SEE ALSO:
plot.lm plot.gam

EXAMPLES:

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news