Re: Statistica and Dr. Velleman

James H Steiger (steiger@unixg.ubc.ca)
Thu, 25 Aug 94 06:34:38 EDT


One man's truth can be another man's lie. In this posting, I
deal specifically with allegations by Paul Velleman about
comparative reviews by StatSoft.

I know both StatSoft and Systat well. In fact, I may well be
the ONLY individual who has seen source code from both
companies. I developed EzPATH, which was distributed for more
than 4 years by Systat. I have developed SEPATH, which will be
part of Statistica Version 5.0.

I am not unbiased. But I believe my views are infinitely more
balanced than those of Velleman, which have numerous, easily
documented inaccuracies. I will document these inaccuracies,
and leave you with an inescapable conclusion that Velleman is
really simply venting his frustrations at a successful
competitor, in a way that is slanderous and intellectually
dishonest.

StatSoft's reviews are not "negative advertising." They are
comparative reviews. There is a distinction. StatSoft believes
their products are more advanced than those of their
competitors, and wants their prospective users to be fully
informed of precisely how and why Statistica is superior to its
competitors.

These comparisons contain much valuable information,
information you will be hard-pressed to find in ANY OTHER
SOURCE. It is information that a prospective software purchaser
really needs to know. Not knowing about serious design
deficiencies in a product can be a particularly expensive kind
of ignorance.

I'll be blunt. The standards for "reviews" appearing in
software magazines are a disgrace. I know numerous examples of
MAJOR bugs in well-known software. Without exception, this
software was reviewed by major software magazines, and, to my
knowledge, NOT ONE magazine ever reported these bugs. Yet, in
most cases, I found the bugs within one week after installing
the product.

The magazines are shills, nothing more. They cater to the "incumbency
edge." Already established software tends to get reviewed favorably,
possibly because the marketers are already advertising in the
magazine.

First, I will debunk the misinformation Paul Velleman has been
propagating rather freely on this forum. When you are through
reading it, I am confident you will have a rather different
perspective on "negative advertising."

To begin with, Velleman and Al Best misrepresent the tone of
StatSoft's reviews. I have the reviews on my desk this moment.
I agree with Steve Machol's posting. Steve found them direct
and honest, and I agree. Throughout, the tone is extremely
reserved and factual. They are brutally honest, which evidently
disturbs Mr. Velleman. Mr. Velleman would rather that nobody
tell prospective users what Data Desk cannot do that they might
think it can do. How genteel of him.

StatSoft wants its users to know what Data Desk cannot do that
Statistica can do. Any user would know that this information
may be asymmetric, i.e., StatSoft is not likely to emphasize
what Data Desk can do that Statistica cannot do. But StatSoft
is entirely up-front in addressing this issue. I quote the
first paragraph from their brochure.

"This comparative review...has been prepared by StatSoft, Inc.
Although each of the programs discussed in this comparison
offers some specific advantages (some of which are mentioned in
the respective sections below), this comparison focuses mainly
on the weaknesses and limitations of these programs; weaknesses
that the developers at StatSoft consider critical and tried to
avoid while designing CSS:STTISTICA."

The review goes on to urge users to try all products out and test
StatSoft's assertions!

"All customers are strongly encouraged to ask their dealers to
allow them to thoroughly test all programs under consideration.
Customers are also advised to perform the comparisons described
in this text and to consider the unique features of each of the
programs, so that their final decision is fully informed."

In other words, StatSoft wants its users to be fully informed.
Moreover, it offers a 14-day money back guarantee on its (non
copy-protected) software. Does Data Desk offer such a guarantee?

Velleman writes:

>I merely noted that Statsoft does tell buyers that competing
>products are bad and does so with pamphlets whose tone is
>unprofessional and whose contents are often false. I appealed
>to readers on the net to look at the materials for themselves
>and judge them accordingly. As it happens, I do believe that
>half-truths, innuendo, and lies can sway the opinions of buyers
>who have not seen the products described,

I ask you, do the above paragraphs I quoted look
"unprofessional?" Or is Velleman maybe distorting things a bit?
StatSoft clearly states that the products have both strenths
and weaknesses! Where does it call them "bad."

The notion that maybe Velleman has a distorted view just won't
go away. Let's examine his attempts to deal with StatSoft's
criticisms. All of Velleman's arguments fall apart upon
moderately close examination, something which David Krantz, Al
Best, and cohorts (who were long on defamatory opinions, and
short on accuracy) were apparently unequipped to perform.

Velleman writes:

>Here is one from the first paragraph on Data Desk: StatSoft (no
>author signs this) writes (the following paragraph is a direct
>quote including punctuation: Data Desk's ... statistical data
>management and graphics options are so limited that, in our
>view, that program cannot be used for general data analysis
>applications. Its manual suggests that "some users" will use it
>only "in conjunction with a traditional statistics package"
>(Quickstart Guide, p. 1). It appears to us that this is an
>understatement: One can hardly imagine a user whose statistical
>data analysis needs would all be satisfied by what is offered
>in Data Desk, and who will not need another >statistics
>program. (end quotation) The tone is clear. The interesting
>detail is that the word "only" was (correctly) omitted from the
>quotation. In fact, what the Quickstart Guide (the one they
>quote is 3 years out of date) said was that "Some users may
>wish to use Data Desk in conjunction with a traditional
>statistics package". The comment was in the context of a
>discussion of how exploratory methods work well with
>traditional methods and how Data Desk offers special facilities
>to make it easy to move data among programs and even across
>platforms.

StatSoft believes that Data Desk is not comprehensive enough to
serve as a general data analysis package. They provide lots of
supporting information. Velleman quibbles about the word
"only." It is an unimportant distinction. Try reading the
sentences without Velleman's bafflegap. If users wish to use
Data Desk "in conjunction with" or "only in conjunction with"
another package, it STILL means that they need to use another
package! In other words, Data Desk cannot meet all their needs!
Why else would they be using another package?

Velleman apparently disagrees with StatSoft's interpretation.
But why? StatSoft's review goes on to give numerous examples of
capabilities not in Data Desk. For example, the review states
Data Desk

"does not offer any multivariate statistics (e.g., MANOVA),
discriminant function, or canonical correlation statisics."

Is this true? If so, it sounds like many users WOULD need to
augment Data Desk with another package. Indeed, virtually EVERY
research clinical psychologist would need another program. Such
users need MANOVA!

On to the next example:

>Another example (to get one with computing content) from
>paragraph 3:
>
>"There is no data transformation language included in Data Desk
>(even simple formulas cannot be executed repreatedly in loops)."
>
>This one is simple; it is an outright lie. All versions of Data
>Desk from 1.0 forward have had a full expression language whose
>results are "hot", updating automatically whenever an
>underlying data value is changed, and propagating the updates
>as appropriate. There is no need for loops because the
>expression language deals with data vectors.

This is actually an example of outright evasion and distortion
by Velleman. There is no other reasonable explanation, unless
the man is totally ignorant of modern statistical packages.

Velleman misleads the users by confusing a DATA TRANSFORMATION
LANGUAGE with a simple formula-based data transformation
capability. These are distinct capabilities. One is much more
advanced than the other.

For example, SYSTAT has both. What do I mean by a formula-based
data transformation capability? If you are in the data
spreadsheet in SYSTAT and you have variables named x,y, and w,
you can say.

Let y = x^3,

or

Let y = x^2 + w^2.

Variable y will be created if it does not exist, and will take
on the desired values. Systat has a full expression language
for doing such simple data transformations. Its commands are
hot, but not persistent. That is, they are executed when given,
but if you later change an x value, the y value does not
change, as in a spreadsheet like Excel.

This is a real design flaw in Systat, as anyone who has used it
can tell you, because you lose track of what led to what.

Statistica has a similar, but more sophisticated capability
that sounds very similar to that of Data Desk. However their
transforms remain active, and are embedded in the data file,
where they can be tracked clearly for future reference. The
user has control of the recalculation process.

So it sounds like, at a superficial level, Systat, Statistica,
and Data Desk all have a simple transformation language for
converting (or creating) one variable to be a function of the
others, while in the data editing mode.

However, there are OTHER kinds of transformations that require
a true transformation language with looping constructs,
if-then, etc, especially when "lagging" variables, etc. That is
why SYSTAT's DATA module has an SYSTAT BASIC language
COMPLETELY DISTINCT from its simple transformation language,
and why Statistica has MML which is COMPLETELY DISTINCT from
the transformation syntax in its data spreadsheets.

Statistica and Systat have two kinds of capability, while Data
Desk has one.

Dr. Velleman, are we seriously supposed to believe you did not
know this and that this distinction is unimportant?

Either Velleman fails to understand the distinction between a
simple capability (simple transformations on vectors of
observations) and a full transformation language with looping
constructs, OR (heaven forbid) he was trying to deliberately
mislead the readers of this forum.

On to Velleman's final example.

>
>...and one with statistics content:

>"The Data Desk implementation of multiple regression is limited
>to the point that it does not support any stepwise procedures
>(which are the major exploratory methods of regression
>analysis)."
>
>In this case, I have even published on the question. Contrary
>to StatSoft's claims, traditional stepwise regression methods
>are very poor exploratory tools. (See Henderson and Velleman,
>"Building Multiple Regression Models Interactively,"
>Biometrics, 37, June 1981 391-411.) Traditional stepwise
>methods often do not make good exploratory tools because they
>are sensitive to precisely the kinds of anomalies that we often
>explore data to discover (such as needed transformations and
>outliers.) In fact, Data Desk supports a guided stepwise
>procedure in which diagnostic plots can be watched as each
>variable is added to the model, supporting the kind of model
>construction described by Henderson and Velleman.

So, indeed it appears Data Desk DOES NOT implement stepwise
regression procedures of the kind implemented by other
programs. It DOES implement its own peculiar procedure, which
Velleman feels is superior, but no one else (correct me if I am
wrong, Paul) has placed in a major package.

I applaud Velleman for being courageous in implementing what he
feels is a good procedure, while leaving out more traditional
procedures. (I also leave out procedures I feel are
inappropriate.)

However, it seems clear to me StatSoft was probably referring
to the same stepwise procedures everyone else has when they
criticized Velleman Indeed he do not implement any of the
commonly recognized stepwise procedures. I agree StatSoft
should augment their criticism, perhaps adding the word
"traditional," with an explanatory note.

>These are not an isolated examples. I can cite many more on
>Data Desk alone, and JMP, StatView, SuperANOVA, Systat, and
>SPSS all come in for similar treatment.

So far, you have only convinced me that you cannot deal with honest
criticism. I suspect your "many more" examples are equally weak.

>Anyone who has studied propaganda writing or is acquainted
>with Joseph McCarthy's methods can recognize the technique.
>One simply piles lie upon lie so that the reader concludes
>that with all this smoke there must be a fire somewhere.

So far, Paul, it appears you are the one trying to pile up
misleading statements.

>I repeat, these are not isolated examples; misstatements,
>half-truths, and outright lies can be found throughout the
>document. A similar document smears packages on DOS, and yet
>another does Windows (sample comment: "SPSS for Windows, in our
>view, cannot be considered to be a true Windows application.")

Paul, SPSS for Windows CANNOT be considered a true Windows
application. Do you know what a true Windows application is?
Have you ever programmed for Windows?

StatSoft is quite correct to distinguish between a "ported DOS"
application like SPSS, and a true Windows application like
their own. True Windows applications support MDI, DDE, OLE,
etc. They are much more difficult to develop and program than
"ported DOS" applications, that simply open a couple of windows
and dump output to them, in teletype mode, without using any of
the advanced features of Windows. I have demonstrated
Statistica in my department to several users of SPSS for
Windows. They are literally stunned by the difference in
technology. There are many SPSS users who have been misled by
the "incumbency edge." They still think the package is state of
the art. StatSoft's comparative review goes on to list numerous
ways the SPSS fails to take advantage of Windows' capabilities.
This is useful information.

>Indeed, if these documents are to be believed, all of the users
>of statistics software other than Statistica must be utter
>fools to have missed these gross failures.

No, Paul, they just haven't tried Statistica yet.

Velleman's mischaracterization of perfectly legitimate (and in
my view EXTREMELY USEFUL) marketing technique as somehow
dirtying the squeaky-clean atmosphere of academia is laughable
in its naivete. His attempt to seize the high moral ground,
while engaging in vicious slander of an extremely innovative
software company is nothing more than a standard political
technique, used by Nixon, among others. It is cheap, and
blatantly dishonest. If a politician said "Although my opponent
is a child molester, I plan to take the high road in this
campaign," we would burst into laughter. Velleman's tactics are
equally transparent.

Paul Velleman wrote:

>I repeat: professionals do not try to pump themselves up by
>denegrating their professional colleagues. They certainly do
>not tell lies about their colleagues, and if they do they are
>censured. StatSoft has shown by their publications that they
>do not bring a professional approach to their work.
>
>-- Paul Velleman

Velleman accuses StatSoft of lying in numerous places, then
assures everyone that this is not the key issue. He is making
it a key issue, and the notion that someone is lying in these
postings may haunt him in the future (keep posted, readers).

Actually, professionals frequently denigrate their colleagues,
and frequently get ahead by doing so. One of the most effective
ways to promote yourself in academics is to find serious flaws
in work that was previously considered "leading edge." Indeed,
criticizing the work of others is a fundamental scientific
activity.

Velleman himself, in a communication I quoted above, stated
that traditional methods of stepwise regression are faulty.
This sounds suspiciously like negative advertising to me. I
will quote it again.

>traditional stepwise regression methods are very poor
>exploratory tools. (See Henderson and Velleman, "Building
>Multiple Regression Models Interactively," Biometrics, 37, June
>1981 391-411.) Traditional stepwise methods often do not make
>good exploratory tools because they are sensitive to precisely
>the kinds of anomalies that we often explore data to discover
>(such as needed transformations and outliers.) In fact, Data
>Desk supports a guided stepwise procedure in which diagnostic
>plots can be watched as each variable is added to the model,
>supporting the kind of model construction described by
>Henderson and Velleman.

Velleman states that these other methods are BAD. God forbid!
Someone's feelings might be hurt!! How sleazy. Why didn't he
just say HIS METHOD WAS GOOD???

By now, hopefully, readers of this forum are starting to grasp
how cynical, misguided, and intellectually weak Velleman's
postings are.

Perhaps Velleman's time would be better spent revising his
software to make it more competitive with Statistica. I, for
one, have heard enough of his slanderous diatribes.

steiger@unixg.ubc.ca

James H. Steiger
Dept. of Psychology
University of British Columbia
Vancouver, B.C., Canada V6T 1Z4

--
James H. Steiger
Dept. of Psychology
University of British Columbia
Vancouver, B.C., Canada V6T 1Z4