Re: [S] is 'mclust' non-deterministic???

Lutz Prechelt (
Thu, 05 Mar 1998 11:12:53 +0100

> I did not see any replies to your inquiry about mclust.

Indeed. Yours is the first answer at all.

> I don't
> use mclust, but I do know that many clustering algorithms use
> randomization to resolve ties in the distance functions between
> 2 points or clusters.

I thought about this, but then I wondered why there wouldn't
be an option to turn randomization off? A clustering procedure
is supposed to compute some kind of mapping and having that
nondeterministic is very nasty for further programming.

Thanks for your help, I am more confident now in what I am doing.

In any case it also turned out that mclust was so slow given
larger amounts of data that I can hardly use it at all.
I have a fast machine, yet mclust (spherical) takes about
ten minutes on only 1711 scalar data points (with lots
of ties).
Maybe someone who knows more about mclust could
comment on that? I always thought that hierarchical agglomeration
ought to be a relatively simple procedure?

> If you have some equal distances among your
> data, you can then get different results from different runs.
> If this causes large fluctuations in your clusterings, this is a
> sign that your data isn't clustering well and that your clusters
> are artifacts of the algorithm and not real.
> Sincerely,
> John Van Ness
> Professor, UT-Dallas

Thanks a lot for your helpful answer.


