RE: [S] Adjusted variables plots

Steven Paul Millard (probstat@nwrain.com)
Tue, 10 Mar 1998 10:46:44 -0800

Hello,

John Thaden asked about creating Cook's Distance plots for each
coefficient in a multiple regression model, and also about partial
residual plots.

1. You can use the function Cook.terms() to compute Cook's distances
for each coefficient. This function is available in version 4.0, but
there is no help file for it. It is explained on pp.230-233 of
Chambers and Hastie ("Statistical Models in S"). It takes as its
argument the result of calling lm() or glm().

2. To create partial residual plots, you can call the function
plot.gam() explicitly for an "lm" or "glm" object. The partial
residuals that are plotted are actually the centered partial residuals.

You can get more information on these functions in the elusive
"Statistical Models in S" book by Chambers and Hastie, and also in the
"Statistical Models in S-PLUS" training manual (I'm not sure whether
MathSoft sells this manual, you may have to actually take the course to
get it).

Here is an example:

fuel.lm <- lm(Fuel ~ Weight + Disp., data = fuel.frame)

cooks.mat <- Cook.terms(fuel.lm)
graphsheet(page = T)
plot(cooks.mat[,"Weight"], type = "h", xlab = "Weight",
ylab = "Cook's Distance")
plot(cooks.mat[,"Disp."], type = "h", xlab = "Disp.",
ylab = "Cook's Distance")

graphsheet()
plot.gam(fuel.lm, resid = T, rug = F, scale = 2)

Note: the second call to graphsheet() is used because otherwise
plot.gam() will erase the cook's distance plots in the original
graphsheet() (feature?).

Sincerely,

--Steve M.

_____________
| *** | Steven P. Millard, Ph.D.
| * |
| * * * | P robability, TEL: 206-528-4877
| * * * | S tatistics & FAX: 206-528-4802
| * | I nformation E-mail: SMillard@ProbStatInfo.com
| * | Web: www.ProbStatInfo.com
| *** | 7723 44th Avenue NE
|___________| Seattle, WA 98115-5117 USA

-----Original Message-----
Sent: Friday, March 06, 1998 12:30 PM
To: s-news@wubios.wustl.edu
Subject: [S] Adjusted variables plots

I want to look for high-influence observations in my dataset. The
Cook's
distance is useful, but I'm interested in looking at influentials
separately for each coefficient of a multivariate regression. I've
used
adjusted-variables plots (I think they are also called
partial-regression
plots) in the past to find outliers and leverage points. So for the
k-th
coefficient coefficient Beta sub-k, I would plot residuals of the
regression of outcome Y on all predictors X except the k-th, versus
residuals of the regression of the k-th X on all X's except the k-th.

Before I try to write a function to speed up this type of diagnostic
analysis, I'm wondering is such a function exists? Or can predict() be
used to this end? I've found nothing in the SPlus v. 4.0
documentation.
Should I be looking in libraries contributed by others?

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To
unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news