Re: [S] Regarding S-PLUS 5.0 Performance

Prof Brian Ripley (ripley@stats.ox.ac.uk)
Fri, 4 Dec 1998 10:59:58 +0000 (GMT)


> Date: Thu, 03 Dec 1998 06:59:51 -0600
> From: William Shannon <shannon@osler.wustl.edu>

> I felt the major deficiency of 3.4 was its inability to analyze large
> datasets. From the comments above it appears that 5.0 has reversed the
> problem -- it is now difficult to handle small datasets. Here is a

> The 5.0 requires 26 minutes to invert 10,000 3x3 matrices versus 3
> minutes with 3.4. There 'IS' a problem here that needs to be solved to
> make this product acceptable to people doing applied data analysis!

More precisely, it needs 26 minutes to do it THIS WAY. I am not aware
of any `applied data analysis' problem on small datasets that requires
such a calculation, but if there is one, there are faster ways on
5.0r3, at least. (BTW, I believe the posting of results from beta
software to be irresponsible.)

I ran a set of scripts for the V&R2 chapters last night, on my Sun
Ultra 1/170 with 64Mb RAM (sufficient) and 200Mb swap (ditto):

chapter 2 3 4 5 6 7 11 12 13 14
3.4r1 3.0 3.2 9.4 123 16.9 52 572 355 108 74
5.0r3 6.1 9.3 184 1201 77.2 266 840 522 141 322

I replaced step() by stepAIC() in all cases, as step() is seriously
broken in 5.0 and gives completely incorrect answers. The results from
chapters 5-14 are genuine applied data analysis of small datasets.

There appears to be a considerable speed penalty, especially when
running pure S code. Note though that almost all of the statistical
models code in 5.0 is still running in compatibility mode, and one
would expect a drop in performance (if not this much). Would one
really notice? Doing bootstrapping (ch05) and cross-validating
trees (ch14), yes, so we need to think about improving those. Otherwise
the results are already fast enough for me: remember these are lots
of analyses per chapter. (Indeed they are much faster than 3.2 on a Sun
IPC which is how we first did most of the examples in 1992/3.)

I expect I will be using 3.4 for some time-consuming calculations for
a long time to come, but 5.0r3 is perfectly adequate for my routine
work (unlike 5.0r2, which used much more memory).

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news