[S] Combining Image and cluter-tree plots side by side

Jae K. Lee (jaeklee@miner.nci.nih.gov)
Thu, 21 May 1998 11:01:37 -0400


We are working on a huge database (10+GB) of data mining of anticancer
molecular drug discovery. We often generate a color-based image plot
of matrix (currently size of 300 X 4000, but hope to make one much bigger
like 4000 X 10000), so that we can effectively screen a large number of drugs
and genomic information.

We are heavily using Splus, mostly on SGI Unix, due to memory and speed
problems, even though we have all new versions of pc-based Splus4.0 and 4.5.
We found that using CLUSTER-TREE and IMAGE plots side-by-side, we can extract
many intersting information from such a large matrix.
However, there are several technical difficulties:

1. Since we cann't read off labels of such a large number of entries,
and since we often find some intersting local areas of the image plot,
we hope to be able to zoom in or restrict to some ares of the image plot
our eye or some heuristic checking. Then, regenerate the restricted image
and cluster-tree plots side-by-side (I know some other graphical devices,
such as AVS and Noesys, provide such a capability, but not Splus).

2. We want to match the sizes of image and cluster-tree plots, so that we
can direclty read off what are the entries column-by-column or row-by-row
of the image and cluster-tree plots
(in this case we hope to have a way even for small matrices like 200 X 300).

As may be seen, the problems are not simple, and there may be no exisitng
way in Splus to handle our needs. We plan to invest quite a lot in this
development, and I want to ask a help from our genious Splus news group
First, if you know any ways in Splus to handle these things,
please, let me know. It will be more than appreciated.
Second, if no presnt Splus function can handle
these, since we are willing to develop a further refined algorithms for our
need, if you can provide me some basic codes of Splus functions
of image and cluster analyses (preferably, C or C++; I found
one cluster algorithm in StatLib witten by Fortran), then I will
post our development back to our general public place, as soon as we can
make it. Any other help for our study will also be appreciated. Thanks.


Jae Lee

Jae K. Lee, Ph.D.                         Tel:  (301) 496-9572
National Cancer Institute                 Email: jaeklee@miner.nci.nih.gov
National Institutes of Health              
Bldg. 37, Room 5D-02
9000 Rockville Pike
Bethesda, MD 20892                        Fax:  (301) 402-0752
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news