Re: [S] more powerful model without intercept?

Dave Krantz (dhk@paradox.psych.columbia.edu)
Wed, 18 Mar 1998 14:06:54 -0500


The two fits that Lutz Prechelt displayed in his query about
models without intercept are in fact identical fits of identical
linear models, with slightly different parameterizations.
In the first fit, the two levels of variable AP are parameterized
by an "intercept" and an "effect" of AP, 18.443 and +- 0.088,
while in the second, fit2, the same two levels are represented
by the coefficients of two dummy variables, one for AP=="ALT" (18.531)
and one for AP=="PAT" (18.355), the values obtained by adding and
subtracting 0.088 to and from the intercept 18.443.

In the situation Prechelt is dealing with, it seems clear from the
similarity of results in the two levels of ALT that it would not
make any sense to drop the intercept to 0. And in fact, that will
not happen, as long as AP is kept in the model, even with the -1.

The reason for the difference in r-squared values is that the model
fit is being compared with two different "null" models: a model with
just an intercept (= grand mean) and nothing else, in the first case,
and a no-parameter model that simply fits all the data with 0, in
the second case. It is necessary to keep in mind that "r-squared"
is not a property of the model fit, it is a COMPARISON of TWO model
fits; the implicit "null" model is important. Obviously, by making
the "null" model bad enough, you can have r.squared as high as you
like.

The practice of using y = 0 as the implicit comparison standard for
models fit without intercept is not confined to Splus; I've seen it
in every software package that I've used. It is sometimes quite
convenient (as when dealing with difference scores, where mean=0
is often a sensible comparison model), but mostly it badly misleads
people who don't keep in mind the nature of the comparison that is
involved in "r-squared". I would like to see this changed, not in
Splus particularly, but in statistical software generally. At the
cost of a bit of convenience, I would favor writing software that
forces analysts to specify both models any time they call for comparison.

A somewhat similar problem arises in connection with the "t" statistics
that appear in the coefficients table produced by summary.lm(). Again,
what Splus does is typical of statistical software, but often misleading.
In particular, the implicit 1-df comparisons that are incorporated in
these t statistics often use a comparison model that makes no sense
scientifically. I don't have a good idea about how to change this;
it might be much too hard to produce a uniformly better output;
but it might help at least to have a warning that (unless deliberately
suppressed) provides inexperienced analysts with something to worry
about and motivates them to get some advice before they interpret the
t statistics and accompanying p-values. The most radical change would
be to confine the output to standard errors and suppress the t and p
values in the output of summary.lm, forcing people to make explicit
comparisons of pairs of models both of which have some scientific
interest. Obviously, in Splus, any of us could do that privately,
for the sake of our own students, not to say ourselves, but I'd be
interested in other views of this matter.

Dave Krantz (dhk@columbia.edu)
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news