Bill Venables (
Wed, 1 Apr 1998 08:58:05 +0930

Frank E. Harrell, Jr. finishes his message with:
> When deciding on future directions for software all of the
> debates about statistics come alive. I know that many will
> criticize my point of view. I just wanted to give my $.02 worth
> from the standpoint of an applied biostatistician.

How could I possibly refuse an invitation like that? It would
almost be impolite.

> Here is my vote for what not to expend great efforts adding to
> S-Plus: exact methods. We have so many bigger things to worry
> about ... A classic saying of Tukey about exact solutions to
> the wrong problem comes to mind.

Sorry Frank, but I agree wholeheartedly.

> ... And I'm still not a fan of conditioning when marginal cell
> counts were not pre-specified by the experimental design (and
> mine never are).

During life people sometimes change their name, their nationality,
their religion, their spouse, their make of car... but they never
change their football team and once a Bayesian, always a Bayesian.

> My second vote on what not to implement is type III sums of
> squares and F-tests, which are more problematic than most
> statisticians assume.

Frank, where were you when we needed you the most? The age of
purity and innocence is over, the SASification of S-PLUS has
begun: type III sums of squares are already in S-PLUS 4.5 beta!

> Here are my votes on what would be worth doing, not in any
> particular order:
> 1. Handle NAs in a smart way for all modeling functions....

I agree in principle, but I think I would say allow more built-in
options for how they are handled. It is a difficult question,
though. Many packages allow patently risky and dangerous options
for missing data simply on the grounds that "that's what the
customer wants". Maybe it is, but not this customer.

> 2. Sample size and power calculations for the normal-errors

Extremely important, but teaching people how to use it is
probably more so, and that is difficult.

Many packages offer easy access to sample size calculations and
that can be very dangerous. I had a client a few weeks ago who
came to see me because the automatic sample size calculator said
she only needed 2 subjects in each of her two samples! By
contrast a colleague was called in to help a student who had
already done a large and complex survey with 600 in each of two
samples, again as suggested by the program, when any careful
assessment beforehand would have shown that 60 in each would have
been more than sufficient, which at that stage was now clear.

By the way Juergen Bock, a reader of this list, has a very easily
accessible and comprehensive book on sample size calculations.
The original German edition has now been gracefully translated
into English and should very soon be published. I suggest you
keep an eye out for it.

> 3. Continue to expand capabilities for random effects models,
> with various post-fit estimation, multi-level hierarchies, and
> other analytic capabilities. Some of this can be done by
> having an elegant interface with the WINBUGS Bayesian modeling
> package from Cambridge.

Both topics were high on the list I sent privately to Charles.

> 4. Bootstrap and multiple imputation methods for accounting for
> imputing missing values when making inferences.

I'd need to know more about this one.

> 5. Anything that helps with non-randomly missing serial data.

If anything can help, I doubt it will be mere software.

> 6. A world-class online help facility that allows users to
> navigate in many ways,

Again, long overdue, I agree.

> e.g., getting to a comprehensive set of examples of
> managing and recoding data.

This is likely to be a tough one, since one person's simple
illuminating example is to someone else a total enigma. (Of
course you could always ask for an online version of V&R...:-)


Bill Venables, Head, Dept of Statistics,    Tel.: +61 8 8303 5418
University of Adelaide,                     Fax.: +61 8 8303 3696
South AUSTRALIA.     5005.   Email:

