Re: [S] stepwise discriminant analysis

Prof Brian D Ripley (ripley@stats.ox.ac.uk)
Mon, 10 Aug 1998 19:53:18 +0100 (BST)


On Mon, 10 Aug 1998, William M. Grove wrote:

> I see that step() and stepAIC() work on objects of the "lm" class (e.g.,
> returned from glm(), gam() ), according to the documentation. As such I
> assume it won't work with an unaltered form of lda() or fda().

> I don't want to write a whole stepwise discriminant function from scratch.
> (I want to take the output of such a function and bootstrap it using Shao's
> consistent bootstrapping model selection technique.)

I presume that you are talking about _linear_ discriminant analysis here.

> I have considered writing a wrapper funciton for, say, fda(), which coerces
> the class of the returned object to "lm" and sets "Deviance" to something
> useful like the classification error rate (on the training sample).

And how is that useful? The point of stepwise fitting is to do worse on
the training set but better on the test set. Stepwise linear discriminant
analysis does this by a series of test statistics on nested fits, and those
test statistics are not simply related to those for linear models. That is
a different process from trying to find a model with the smallest AIC.

> Does anyone know if I can then use this function to trick step() or
> stepAIC() into performing a stepwise discriminant analysis? (I realize I'd
> need to get the scale parameter for AIC treated as zero to get that part of
> the optimization criterion out of the picture.) Or will the whole thing
> blow up in my face owing to something I don't understand about how step()
> or stepAIC() is going to call the fda()-like function?

You can't trick step.(g)lm at all: that will fit a linear model. You
don't need to trick stepAIC, you just need to write extractAIC.lda.
But it is not clear to me what that should be (or I would have written it).

Actually, I think that stepwise logistic discrimination is a sounder
approach, and that is implemented by using stepAIC with my multinom package
(now part of nnet in V&R2). If you just have two classes, you can use
stepAIC on a glm fit. There are examples in V&R2 chapter 17 and the
on-line complements.

I would think very hard before bootstrapping problems like this: I know of
no theory to justify the bootstrap in such a complicated problem, and do
know of a number of examples in which it is not valid that are quite
closely related to this one.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news