[S] Summary: anova drops off variables

W. Keith Moser (4ester@compuserve.com)
Thu, 28 May 1998 07:03:48 -0400


S-Newsers:

I received several good replies from my question about why ANOVA (aov)
drops off variables. All spoke to collinearity of cmpt and the "dropped
variables."

**Alan Zaslavsky wrote:
try print.lm(arst.aov) to see the coefficients. I suspect that cmpt is a
factor that includes the levels of the variables that fell out of the
model. e.g. if you had a factor "black" and a factor "white" and another
one "color" with levels black, white, red, orange, green, then if you put
"color" in the model ahead of the other two factors, the latter two factors
are redundant and get dropped out

First, I tried print().

#pepa (used below) and arst (used in my initial post) are different
species, but the S-plus effects are the same.

> print(pepa.aov)
Call:
aov(formula = pepa ~ cmpt + basum + numgrowbrn + season + lastburn,
na.action = na.omit)

Terms:
cmpt basum Residuals
Sum of Squares 18.5088 4.9856 385.3390
Deg. of Freedom 5 1 143

Residual standard error: 1.641547
3 out of 10 effects not estimable
Estimated effects may be unbalanced

I wasn't sure this was telling me why there was a problem.

**John Wallace suggested:
The missing ones must be highly correlated or co-linear with cmpt. Try
using summary() on the aov() output.

So, I tried summary(). It did not provide different output from anova() in
this case.

> summary(pepa.aov)
Df Sum of Sq Mean Sq F Value Pr(F)
cmpt 5 18.5088 3.701752 1.373727 0.2376347
basum 1 4.9856 4.985609 1.850169 0.1759042
Residuals 143 385.3390 2.694678

**Brian Ripley wrote:
Those variables are aliased with cmpt, that is only take constant values
within each level of cmpt, at least when cmpt is not missing (as it seems
to be in 2 cases[actually, Prof. Ripley, the residuals differs only because
the there are different numbers of variables - the variables and residuals
Df all add up to 149]). Try print or summary, which will tell you about
aliasing, and look at alias too.

So, I tried alias (a function which I did not immediately find in the
manuals). The output confirmed Professor Ripley's (and everyone else's)
suggestions.

> alias(pepa.aov)
Model
pepa ~ cmpt + basum + numgrowbrn + season + lastburn

Complete
(Intercept) cmpt1 cmpt2 cmpt3 cmpt4 cmpt5 basum
lastburn 9 -3 -1 1 2 -1
season 4 -6 2 1 1 1
numgrowbrn 5 -3 3 -2 1 1

Partial
(I) c1 c2 c3 c4 c5 b
(Intercept) 1 1 7 3 3 -9
cmpt1 -2 -1 -1 -1 -1
cmpt2 1 1 1 -1
cmpt3 2 2 -7
cmpt4 1 -3
cmpt5 -3
basum

Notes:
$"Max. Abs. Corr.":
[1] 0.964

The output is logical, in that cmpt is "compartment" a management unit
where different sequences of prescribed fire (fire is a "hot" topic in
Florida - Georgia right now) are practiced, resulting in different season
(season of burn), numgrowbrn (number of growing season burns) and lastburn
(years since last burn) characteristics. I had initially tried to use
cor() on these variables but, being categorical, I did not get any answer
(see below).

> cor(numgrowbrn, season)
Error in .C("S_Var2_NA",: There are 150 missing value(s) in x and/or y
passed to cor or var with na.method="fail". See the help file for other
options for handling missing values.
Dumped
Warning messages:
150 missing values generated coercing from character to numeric in:
as.double(y)

> cor(season, cmpt)
Error in .C("S_Var2_NA",: There are 300 missing value(s) in x and/or y
passed to cor or var with na.method="fail". See the help file for other
options for handling missing values.
Dumped
Warning messages:
1: 150 missing values generated coercing from character to numeric in:
as.double(x)
2: 150 missing values generated coercing from character to numeric in:
as.double(y)

The values were _not_ missing.

One final question: When various regressions, anovas, etc. calculate Df and
pull out one of the categories in a variable [I can't think of the
technical term, sorry] (for example, the study had six cmpts, but the
various output tables list only five), which one do they take - the first
one or the last one? When I see various output tables, I am wondering to
which cmpts or spp (species) they are referring to.

Thanks to all for your help.

W. Keith Moser, D.F.
Ecological Forestry Research Scientist
Tall Timbers Research Station
Route 1, Box 678
Tallahassee FL 32312-9712 USA
tel: +001 850 893-4153 ext 247
fax: +001 850 668-7781
email: 4ester@compuserve.com

-------------------------------------------------------------------
On Wed, 27 May 1998, W. Keith Moser wrote:

> S-Newsers
>
> I have a ridiculously simple question, but I cannot find the answer in
the
> S-plus 4.5 documentation.
>
> I have a data set where I am examining the percent cover of particular
> species of plants. I have categorical variables (cmpt, numgrowbrn,
> lastburn, season) and numerical variables (basum).
>
> I ran two anovas:
>
> > arst.aov <- aov(arst ~ cmpt + basum + numgrowbrn + season + lastburn,
> na.action = na.omit)
> > anova(arst.aov)
> Analysis of Variance Table
>
> Response: arst
>
> Terms added sequentially (first to last)
> Df Sum of Sq Mean Sq F Value Pr(F)
> cmpt 5 179.420 35.88396 1.639962 0.1531631
> basum 1 48.415 48.41475 2.212641 0.1390859
> Residuals 143 3128.979 21.88098
> > arst.nocmpt.aov <- aov(arst ~ basum + numgrowbrn + season + lastburn,
> na.action = na.omit)
> > anova(arst.nocmpt.aov)
> Analysis of Variance Table
>
> Response: arst
>
> Terms added sequentially (first to last)
> Df Sum of Sq Mean Sq F Value Pr(F)
> basum 1 16.799 16.79910 0.753463 0.3868164
> numgrowbrn 1 43.061 43.06106 1.931348 0.1667411
> season 1 62.069 62.06851 2.783858 0.0973765
> lastburn 1 1.986 1.98568 0.089061 0.7658021
> Residuals 145 3232.900 22.29586
>
>
> QUESTION: Why did the first anova drop off numgrowbrn, season and
lastburn?
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news