Re: [S] S-plus dictionary of statistical terms

John Maindonald (john.maindonald@anu.edu.au)
Thu, 16 Apr 1998 09:47:27 +1000 (EST)


Christian Ritter wrote
> Does anyone of you know of an electronic dictionary of statistical
> terms (such as confidence intervals, two level factorial designs, p-values,
> R-squared's, Cp's, ...).

Some elements of this are needed as an exercise in getting
S-plus technical documentation and output labelling into order.
For example we need

1) consistent usage of the terms AIC and Cp, with a clear
definition. It is pretty confusing to read in the Splus4 Guide
to Statistics: "The Cp statistic (actually what is shown is the
AIC statistic ...)". Worse, this is not the usual Akaike form
of the AIC statistic except in the special case when the scale
parameter is 1. So I think it fair to ask StatSci to give a
reference for the particular version of AIC which they have
chosen to use.

2) In glm and related calculations, defendable and documented
defaults for the dispersion parameter (also called the "scale"
parameter, even though it is really a scale^2 parameter).
There are serious inconsistencies, in terminology, in defaults
and in whether one can specify the dispersion at all, between
different functions, so that for binomial and Poisson models
the default may be a scale of 1 or (in predict.glm) the
estimate based on the Pearson statistic. step.glm seems to use
the Pearson statistic by default for what it calls the "scale"
parameter, while (at least in Splus 3.4) the default for drop1
is not documented. In predict.glm one cannot specify a scale
parameter (there are of course easy ways around this).

3) attention to similar terminology, consistency, documentation
issues in other contexts which have been documented in S-news.

Needed enhancements to existing analyses include:

1) direct control of the "standard error of estimate" parameter
in lm models. The inablility to control is likely to
encourage, in some models, use of inappropriate "error"
estimates.

2) automatic detection of models, or at the very least the
printing of a warning, where where predict.gam should be used.
Notwithstanding the documentation of the dangers here, it is
easy for novices (or even more experienced users) to neglect or
forget or be confused about how widely this point applies. For
example it applies to lme models when poly() etc terms appear,
as I found to my sorrow.

3) Tables of means and standard errors in unbalanced multi-stratum
models; c. f. V&R 2, p. 302.

I am very concerned that StatSci tidy up such matters before
racing ahead to add new analyses. I do not want to see further
inconsistencies and associated traps.

Finally, I do not relish the prospect of wasting time explaining
to users that Type III sums of squares have been added for no
good purpose. Will StatSci provide <any> examples where Type
III sums of squares are defendably helpful?

John Maindonald email : john.maindonald@anu.edu.au
Statistical Consulting Unit, phone : (6249)3998
c/o CMA, SMS, fax : (6249)5549
Australian National University
Canberra ACT 0200
Australia
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news