Re: [S] Multiple R-squared from lm

Dave Krantz (dhk@paradox.psych.columbia.edu)
Mon, 1 Jun 1998 09:15:33 -0400


Robin Reed inquires:

> Suppose we have a data.frame which contains the
> response y and a factor, x, with say 4 levels. Then
>
> y ~ . and y ~ -1 + .
>
> are different parametrisations of the same model...
>
> In SPLUS, (v3.3 for Windows and v4.5), calling summary
> gives the results that the 2 fits have different values for Multiple
> R-squared and the F-test for regression. (Other quantities such as s
> are the same.) This appears to be caused by the fact that SPLUS
> uses the formula for the no-intercept case when evaluating Multiple
> R-squared for the second model.
>
> What do people think of this behaviour? I much prefer no
> information to misleading information and so I believe it would be
> better if SPLUS output no values at all for these quantities
> in the no-intercept case.

This problem is not peculiar to Splus--it can be seen in one form
or another in every statistical package that I have used (although
not every package will show it in this particular case, since the
Splus syntax makes it particularly easy to do these two different
"analysis of variance" fits by regression methods).

The final models are indeed the same, though parameterized differently;
but they are being compared against different baseline models: y ~ .
produces a comparison with y ~ 1 while y ~ -1 + . produces a comparison
with y == 0 (which is not a valid formula but still expresses a possible
model, with no free parameters).. Thus, df, r^2, and F are all different.

The crucial thing to keep in mind is that df, r^2, and F ALWAYS
involve comparisons of TWO models.

Statistical software packages often make it appear that r^2 and F
have been calculated for a single model. They do so by inserting an
implicit "null" model with which the current model is compared.
In many cases, the implicit null model is just mean + random error,
with no explanatory variables. In some other cases, the model is
score = 0 + random error. Sometimes [ as with "unique" r^2 values
or the t values produced by summary.lm(formula)$coefficients ]
the comparison is between a general model and the same model deleting
one term. In the simple product-moment correlation, the r^2 compares
a linear fit with a horizontal line (mean + random error).

If one keeps in mind that r^2 and F always involve a comparison,
much grief can be avoided. Personally, I would prefer that
software be designed always to show the PAIR of models explicitly,
or even to prompt the user for the pair that should be compared.
The user needs to be thinking about whether BOTH models are of
scientific and/or practical importance. If not, then the comparison
has dubious value and needs to be interpreted with great caution
(or ignored). In the particular case of Reed's query, the model
y ~ 0 may or may not make any sense scientifically or practically.
For example, it sometimes does make sense when y consists of difference
scores between two other variables.

Dave

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news