Re: [S] Multiple R-squared from lm

Bill Venables (wvenable@attunga.stats.adelaide.edu.au)
Mon, 1 Jun 1998 22:55:38 +0930


Robin Reed writes:
> Suppose we have a data.frame which contains the response y and a
> factor, x, with say 4 levels. Then
>
> y ~ . and y ~ -1 + .
>
> are different parametrisations of the same model. The first has
> an intercept and 3 columns for the factor and the second has no
> intercept but 4 columns for the factor.
>
> In SPLUS, (v3.3 for Windows and v4.5), calling summary gives the
> results that the 2 fits have different values for Multiple
> R-squared and the F-test for regression. (Other quantities such
> as s are the same.) This appears to be caused by the fact that
> SPLUS uses the formula for the no-intercept case when evaluating
> Multiple R-squared for the second model. (For the particular
> dataset that I had, the value of R-squared moved form 0.66 to
> 0.98.)
>
> What do people think of this behaviour? I much prefer no
> information to misleading information and so I believe it would
> be better if SPLUS output no values at all for these quantities
> in the no-intercept case.
>
> Robin Reed

You will probably find the FDA requires it....

More seriously, it is a fairly glaring anomaly and should be
corrected. I have to say, though, that anyone who glances at a
multiple correlation coefficent without fully taking in the
details of what it is and starts to draw conclusions is running
something of a big risk.

In this context the multiple correlation coefficient is simply
another way of measureing the improvement one model affords over
another. It is only a convention that has the baseline model as
the "intercept only" model, in many cases the one of interest
would be larger than this but just occasionally it would be
smaller. I guess my point is, if the multiple correlation
coefficient is ever given, it should always be in context with
the baseline and outer models clearly specified, together with
the sample size and model degrees of freedom.

Rather than have it automatically generated from a summary
printout I would prefer to have it available only by a generic
function, where the user had the ability to specify both baseline
and outer models explicitly, or at least the duty to be fully
aware of what they were. Methods could be written for other than
linear regression models, if anyone were so inclined...

Bill Venables.
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news