[S] Model selection using robust regression

Steven Paul Millard (probstat@nwrain.com)
Wed, 4 Mar 1998 18:50:14 -0800


Hello,

Last week I posted a query about how to perform model selection when
you use a robust regression method to fit models. Thanks to Brian
Cade, Kjetil Halvorsen, Brian Ripley, John Wallace, and Pat Burns for
responding.

* Brian Cade cites two papers, one by Machado and the other by Hurvich
and C-L. Tsai.

* Kjetil Halvorsen suggests using a bootstrap approach based on the
ideas put forth in the book by J S Urban Hjort (Computer Intensive
Statistical Methods).

* Brian Ripley cautions me to make a distinction between "robust" and
"resistant" techniques. He says a modification of AIC called NIC may
be useful, but it is based on asymptotic theory which is know to not be
true for ltsreg. He further states that, "Some robust regressions have
enough asymptotic theory to find standard errors of coefficients and
hence enable the use of Wald tests for backwards selection. That's
what I normally use. You can even use 'sandwich' estimators of the
covariance matrix to allow for the approximate nature of the model."

* John Wallace suggests looking at the robust() function. This
function can be used in setting the value of the "family" argument in a
call to glm() or gam().

* Pat Burns states, "An easy thing to do is to fit the whole model,
then pretend that the robustness weights that you get are the real
weights, then do the standard thing with the weighted data. Actually
it isn't all that easy of a thing to do if you have lots of possible
explanatory variables because the weights are subject to change with
different explanatory variables."

The original question and replies are listed below.

Sincerely,

--Steve M.

_____________
| *** | Steven P. Millard, Ph.D.
| * |
| * * * | P robability, TEL: 206-528-4877
| * * * | S tatistics & FAX: 206-528-4802
| * | I nformation E-mail: SMillard@ProbStatInfo.com
| * | WEB: www.ProbStatInfo.com
| *** | 7723 44th Avenue NE
|___________| Seattle, WA 98115-5117 USA

Original Question:
-----------------------

Hello,

Could someone please point me towards some references that discuss
model selection when you use robust regression techniques (e.g.,
ltsreg() in S-PLUS). With linear models you can use partial F-tests or
the Cp statistic. With GLM and GAM models you can use the AIC
statistic. Could you also use a version of the AIC statistic with
robust regression methods?
Also, based on the answer to the first question, has anyone written any
functions in S-PLUS to compute statistics to compare models from robust
fits? For example, there are the functions anova.lm, anova.glm,
anova.gam, step.glm, and step.gam, but nothing similar for objects of
class "lts".
Thank you so much for your time.

********************************************************************
***************************************************

>From Brian Cade:

J. A. F. Machado (1993. Econometric Theory 9:478-493) has a paper on
robust model selection and M-estimation. He uses Schwarz information
criterion. C. M. Hurvich and C-L. Tsai (1990. Statistics and
Probability Letters 9:259-265) use AIC (and modifications) for least
absolute deviation (LAD) regression. I believe any justification for
LAD regression can be extended more generally to any regression
quantile (LAD is just 0.5 regression quantile).
Brian Cade (BRD-USGS)

********************************************************************
****************************************************

>From Kjetil Halvorsen:

One reference with interesting discussion of use of bootstrap for model
choice in regression is:
J S Urban Hjort: Computer intensive
statistical methods, Chapman&hall.

The examples is with least squares, but the bootstrapping ideas should
be usefull also for robust fits.
I found the book very interesting.

********************************************************************
*****************************************************

>From Brian Ripley:

Please do not confuse robust regression techniques with ltsreg, a
resistant but inefficient estimation procedure. The answers to your
question differ somewhat between robust and resistant techniques.
- AIC presumes maximum likelihood estimation and the truth of the
model.
Some robust regression procedures are MLEs, but the whole point about
robustness is not to rely on the model being true. A modification of
AIC, NIC, (see my Pattern Recognition book) might be usable. But both
AIC and NIC rely on asymptotic theory which is at least not known to be
true for ltsreg, and is unlikely to be adequate.
- Some robust regressions have enough asymptotic theory to find
standard errors of coefficients and hence enable the use of Wald tests
for backwards selection. That's what I normally use. You can even use
'sandwich' estimators of the covariance matrix to allow for the appr
oximate nature of the model.
- A fundamental point is what you want to select a model for? Most
model selection techniques are for good predictions by a fixed
criterion, but most robust regression studies are aiming at a good
explanation of (most of) the given data.
- ltsreg does not even minimize a criteria: it just tries to get near
the minimum. Thus comparing the fit criterion, even if we knew how to
do it, would not be practically feasible.

>
> Also, based on the answer to the first question, has anyone written
any
> functions in S-PLUS to compute statistics to compare models from
robust
> fits? For example, there are the functions anova.lm, anova.glm,
> anova.gam, step.glm, and step.gam, but nothing similar for objects of
> class "lts".

Well, they may have, but I'd want to understand the theory first.

********************************************************************
*****************************************************

>From John Wallace:

Hi Steve,

Have you tried using the robust() function? See the help for examples
with glm and gam.
I have not tried it, just knew it was there. So I would be interested
in how things work out.

********************************************************************
****************************************************

>From Pat Burns:

I've not heard a good answer to this. An easy thing to do is to fit
the whole model, then pretend that the robustness weights that you get
are the real weights, then do the standard thing with the weighted
data. Actually it isn't all that easy of a thing to do if you have
lots of possible explanatory variables because the weights are subject
to change with different explanatory variables.
If you get any good ideas, I'd be interested in hearing them.

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news