Abstract of Research
My Ph.D. work is an essential part of a MITACS (Mathematics for
Information Technology and Complex Systems) project Neural networks for pattern
recognition, storage and retrieval, whose goal is to apply and enhance the
information processing capabilities of neural networks, both through the
development of specialized, task-oriented artificial neural networks and
through the study of general properties of artificial and biological neural
networks. In particular, my thesis work focuses on determining efficient means
of finding patterns in large high-dimensional data sets and verifying that the
patterns found are the best representation of the structure in the data. Recent
theoretical results have shown that in high dimensional space, the distance
between every pair of points is almost the same for a wide variety of data
distributions and distance functions. Under such circumstances, it even makes
no sense to talk proximity or clustering in the original full space of all
dimensions. This problem is well known as the curse of dimensionality. Most
clustering methods do not work efficiently for data sets in such high
dimensional spaces because of the inherent sparsity of data. For such data
sets, it is not feasible to find interesting clusters in the original full
space of all dimensions, but pruning off dimensions in advance, as most feature
selection procedures do, may lead to significant loss of information and thus
render the clustering results unreliable.
Therefore, we are facing a feasibility-reliability dilemma in clustering data sets of high dimensionality: on the one hand it is feasible to find clusters only in lower dimensional subspaces due to the sparsity of the data in full space, on the other hand the clustering results by pruning off dimensions in advance are not reliable due to the loss of information. This feasibility-reliability dilemma motivated the concept of projected clustering (Aggarwal et al, 1999) whose central goal is to find projected clusters, each of which consists of a subset C of data points together with a subset D of dimensions such that the points in C are closely correlated in the subspace of dimensions D. For data sets in high dimensional spaces, projected clustering can lead to significant improvement in the quality of clustering. A fast projected clustering algorithm PROCLUS was proposed by Aggarwal et al in 1999. Unfortunately, PROCLUS requires the number of clusters and the average dimension as input parameters. This seems to impose significant challenge for a user. Moreover, our simulations show PROCLUS is sensitive to the choice of these input parameters. In fact, it has been one of the most difficult problems in cluster analysis to determine the number of natural clusters.
On other hand, ART neural networks (Carpenter & Grossberg)
have been shown to be very effective in self-organizing stable recognition
codes in real time in response to arbitrary sequences of input patterns. ART
architecture does not assume the numbers of clusters in advance and allows the
user to control the degree of similarity of patterns placed in the same
cluster. However, ART architecture has to be modified to perform the task of
projected clustering. In particular, ART focuses on the similarity of patterns
in full space and therefore have the difficulty to find projected clusters in
high dimensional data sets due to the inherent sparsity of the data points in
the full space. Besides, preprocessing of the input patterns to an ART system
is nontrivial. In particular, for analog input patterns, a user faces the
problem of meeting the data normalization requirements of ART modules, and
simultaneously preserving the original projected clusters in subspaces. Another
problem is that the subspace of dimensions associated with each projected cluster
cannot be identified in advance. One may attempt to find clusters in all
possible subspaces and then to compare the results to obtain an optimal
partition of the data set, but this is practically not feasible as the number
of all possible subspaces is intractably large for a data set with high
dimension.
Therefore, we have been developing a new neural network
architecture Projective Adaptive Resonance Theory (PART) in order to provide a
solution to this feasibility-reliability dilemma in clustering data sets in
high dimensional spaces. PART is based on ART but with a major modification.
The main new development in PART is the introduction of a selective output
signaling mechanism which allows a generated signal at a node in the input
layer to be transmitted to a node in the clustering layer only when the
corresponding top-down weight is similar to the generated signal. Like ART, the
vigilance conditions in PART control the degree of similarity of patterns in
the same cluster, but the similarity measurement in PART is closely related to
subspaces involved. In particular, the degree of similarity of patterns for a
committed node is controlled by both vigilance parameter and distance parameter
which control the size of dimensions of the projected subspaces and the degree
of similarity in a specific dimension involved, respectively. These two
vigilance and distance parameters are the only required input parameters for
PART algorithm, and our simulations on high dimensional synthetic data show
that the clustering results obtained by PART algorithm are very robust to the
choice of these input parameters. In particular, we observed that PART
algorithm, with a wide range of input parameters, reproduced the original input
clusters efficiently and effectively.
The proposed PART neural network model is described by a large
scale singularly perturbed system of differential equations. We have completely
described the dynamics of PART model when the signal functions are special step
functions, which is the foundation of PART algorithm. In particular, the
winner-take-all paradigm and the learning formulae used in PART algorithm are
derived from the model equations through rigorous proof. We also show that in
some ideal situations which arise in many applications, PART does reproduce the
original input cluster structures.