Abstract of Research

 

My Ph.D. work is an essential part of a MITACS (Mathematics for Information Technology and Complex Systems) project Neural networks for pattern recognition, storage and retrieval, whose goal is to apply and enhance the information processing capabilities of neural networks, both through the development of specialized, task-oriented artificial neural networks and through the study of general properties of artificial and biological neural networks. In particular, my thesis work focuses on determining efficient means of finding patterns in large high-dimensional data sets and verifying that the patterns found are the best representation of the structure in the data. Recent theoretical results have shown that in high dimensional space, the distance between every pair of points is almost the same for a wide variety of data distributions and distance functions. Under such circumstances, it even makes no sense to talk proximity or clustering in the original full space of all dimensions. This problem is well known as the curse of dimensionality. Most clustering methods do not work efficiently for data sets in such high dimensional spaces because of the inherent sparsity of data. For such data sets, it is not feasible to find interesting clusters in the original full space of all dimensions, but pruning off dimensions in advance, as most feature selection procedures do, may lead to significant loss of information and thus render the clustering results unreliable.

 

Therefore, we are facing a feasibility-reliability dilemma in clustering data sets of high dimensionality: on the one hand it is feasible to find clusters only in lower dimensional subspaces due to the sparsity of the data in full space, on the other hand the clustering results by pruning off dimensions in advance are not reliable due to the loss of information. This feasibility-reliability dilemma motivated the concept of projected clustering (Aggarwal et al, 1999) whose central goal is to find projected clusters, each of which consists of a subset C of data points together with a subset D of dimensions such that the points in C are closely correlated in the subspace of dimensions D. For data sets in high dimensional spaces, projected clustering can lead to significant improvement in the quality of clustering. A fast projected clustering algorithm PROCLUS was proposed by Aggarwal et al in 1999. Unfortunately, PROCLUS requires the number of clusters and the average dimension as input parameters. This seems to impose significant challenge for a user. Moreover, our simulations show PROCLUS is sensitive to the choice of these input parameters. In fact, it has been one of the most difficult problems in cluster analysis to determine the number of natural clusters.

 

On other hand, ART neural networks (Carpenter & Grossberg) have been shown to be very effective in self-organizing stable recognition codes in real time in response to arbitrary sequences of input patterns. ART architecture does not assume the numbers of clusters in advance and allows the user to control the degree of similarity of patterns placed in the same cluster. However, ART architecture has to be modified to perform the task of projected clustering. In particular, ART focuses on the similarity of patterns in full space and therefore have the difficulty to find projected clusters in high dimensional data sets due to the inherent sparsity of the data points in the full space. Besides, preprocessing of the input patterns to an ART system is nontrivial. In particular, for analog input patterns, a user faces the problem of meeting the data normalization requirements of ART modules, and simultaneously preserving the original projected clusters in subspaces. Another problem is that the subspace of dimensions associated with each projected cluster cannot be identified in advance. One may attempt to find clusters in all possible subspaces and then to compare the results to obtain an optimal partition of the data set, but this is practically not feasible as the number of all possible subspaces is intractably large for a data set with high dimension.

 

Therefore, we have been developing a new neural network architecture Projective Adaptive Resonance Theory (PART) in order to provide a solution to this feasibility-reliability dilemma in clustering data sets in high dimensional spaces. PART is based on ART but with a major modification. The main new development in PART is the introduction of a selective output signaling mechanism which allows a generated signal at a node in the input layer to be transmitted to a node in the clustering layer only when the corresponding top-down weight is similar to the generated signal. Like ART, the vigilance conditions in PART control the degree of similarity of patterns in the same cluster, but the similarity measurement in PART is closely related to subspaces involved. In particular, the degree of similarity of patterns for a committed node is controlled by both vigilance parameter and distance parameter which control the size of dimensions of the projected subspaces and the degree of similarity in a specific dimension involved, respectively. These two vigilance and distance parameters are the only required input parameters for PART algorithm, and our simulations on high dimensional synthetic data show that the clustering results obtained by PART algorithm are very robust to the choice of these input parameters. In particular, we observed that PART algorithm, with a wide range of input parameters, reproduced the original input clusters efficiently and effectively.

 

The proposed PART neural network model is described by a large scale singularly perturbed system of differential equations. We have completely described the dynamics of PART model when the signal functions are special step functions, which is the foundation of PART algorithm. In particular, the winner-take-all paradigm and the learning formulae used in PART algorithm are derived from the model equations through rigorous proof. We also show that in some ideal situations which arise in many applications, PART does reproduce the original input cluster structures.