[S] Issues Regarding S-PLUS 5.0 Performance

Andrew Bruce (andrew@statsci.com)
Wed, 02 Dec 1998 13:06:56 -0800


Recently, there have been several queries and postings
regarding S-PLUS 5.0 and memory usage/performance
issues. This posting is intended to address these
queries, and covers the following topics:

1) Memory usage on start-up
2) General discussion of S-PLUS 5.0 (Release 3) performance
3) General programming tips

S-PLUS 5.0, Release 3 is now shipping to maintenance
customers in North America. Going from Release 2 to
Release 3, we have been able to significantly reduce
memory usage and increase the speed of several
important operations. A high priority of the
development team at MathSoft is to continue to improve
the performance of S-PLUS 5.0.

Memory Usage on Start-Up
------------------------

S-PLUS 5.0 appears to use more virtual memory on start-
up than does S-PLUS 3.4, but this appearance is
misleading. As far as we can tell, there is relatively
little difference in the actual memory available on
startup to run your S-PLUS programs or other
applications.

If you use a utility such as "ps" or "top" on Solaris,
it reports that S-PLUS 5.0 requires 48 megabytes of
virtual memory (versus 8 megabytes for S-PLUS 3.4).
What it doesn't tell you is that, of the 48 megabytes
reported by Solaris, approximately 30 megabytes are in
memory-mapped files that contain data, functions and
other information S-PLUS needs to run. These files are
memory-mapped to increase speed and to reduce the
amount of real RAM used. The 30 megabytes of memory-
mapped files do not appreciably reduce the amount of
virtual memory available to other applications, or for
other uses in S-PLUS. In addition, about 5 megabytes of
the difference between 5.0 and 3.4 is in large common
blocks in some FORTRAN code that is new to 5.0. This is
never paged in unless that particular code is used but
it does count against swap space.

Discussion of S-PLUS 5.0, Release 3 (5.0R3) Performance
-------------------------------------------------------

S-PLUS 5.0 offers a new object-oriented model based on
the latest generation S language from Lucent
Technologies. The new language significantly improves
support for object-oriented programming in S-PLUS,
expanding the scope of built-in data classes, providing
explicit class structures with slots, supporting
virtual classes, and improving the inheritance model.
The new language also impacts performance in the
following ways:

* Fewer copies of data objects are typically made
during a function call in S-PLUS 5.0 compared to S-PLUS
3.4. This is due to reference counting, memory
mapping, and use of the "copy" argument to calls to C
and FORTRAN code. This generally reduces memory
requirements in S-PLUS 5.0.

* In 5.0R3, additional time is required to set-up
function calls and more functions require more time-
consuming method invocation (since most low-level
functions in S-PLUS 5.0 are generic). This generally
increases evaluation time when operating on small
objects.

Almost everything S-PLUS does is a function call, and
S-PLUS contains approximately 4,000 functions.
Generalizations about a system as complex as S-PLUS are
risky, but the following rules are useful:

* Functions applied to large data objects are generally
faster and use less memory in 5.0R3 than in S-PLUS 3.4.
As an example, our benchmarks indicate that lm() can
handle data sets 2-3 times larger than in 3.4, and is
faster for data sets larger than about 5,000-10,000
observations.

* Functions applied to small data objects are generally
slower in 5.0R3 than in S-PLUS 3.4. In addition,
looping is generally slower and uses more memory when
there are many expressions and small data objects
within the loop. This is due both to the overhead of
small functions, and to the fact that 5.0R3 is not
doing the same type of memory compaction within the
loop (this is something we are looking at in our
development effort). Much of the difference in memory
usage can be removed by encapsulating the contents of
the loop within a single function call.

General Programming Tips
------------------------

Our experience with S-PLUS 5.0 indicates that the
following general programming tips are even more
important in 5.0 than they were in 3.4:

(1) Use vectorized computations.
(2) Use apply, lapply, and related functions for
looping computations wherever possible.
(3) Encapsulate the body of a loop in a single function
call.

==================================================================
Andrew G. Bruce E-mail: andrew@statsci.com
Senior Product Marketing Manager Tel: (206) 283-8802 x 248
MathSoft, Inc. Fax: (206) 283-8691
1700 Westlake Ave. N, Suite 500 Cell: (206) 310-9994
Seattle, WA 98109.9891
==================================================================
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news