Most recommended some variation of:
pmat <- do.call("paste", as.data.frame(X))
uniqueX_X[!duplicated(pmat),]
which worked for my application and was very fast (my matrix X is 16,000
by 15). There were some cautions about rounding error, etc, so this may
not be entirely fail safe.
Phil Spector provided a nicely documented function that answered both of
my questions. That is, it returned a matrix of the unique rows of X, and
it provided a count of the number of duplicates each row represents. His
e-mail message is attached below.
Thanks again to all who have helped,
Steve Edland
Alzheimer's Disease Research Center
University of Washington, 354691
Seattle, WA 98195
---------- Forwarded message ---------- Date: Tue, 17 Feb 1998 10:09:52 -0800 (PST)
From: Phil Spector <spector@stat.berkeley.edu>
To: Steve Edland <edland@u.washington.edu>
Subject: Re: [S] unique() for matrices?
On Tue, 17 Feb 1998, Steve Edland wrote:
>
> I need an function like unique(), but for matrices. That is, you pass it
> a matrix X and it returns a matrix of the unique rows of X. Ideally the
> function would also return a vector with the number of duplicates
> represented by each row of the returned matrix.
>
> Does anybody have an efficient algorithm for this?
>
> Thanks for any help,
> Steve Edland
> Alzhiemer's Disease Research Center
> University of Washington, 354691
> Seattle, WA 98195
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message: unsubscribe s-news
>
Here's a function called "table.mat" that I originally wrote, and which
has been revised by Scott Chasalow. It may be part of a large package
of his which is available through statlib:
"table.mat" <- function(mat, order.rows = T)
{
# DATE INSTALLED: 30 Nov 1993 LAST REVISED: 5 Dec 1994
# AUTHOR: Phil Spector (spector@stat.berkeley.edu)
# REVISED BY: Scott D. Chasalow (sasssc@scri.sari.ac.uk)
#
# PURPOSE: Count occurrences of rows in a data.frame or matrix
# ARGUMENTS:
# mat: a matrix, data frame, or vector
# order.rows: a logical value. If true, rows of result are sorted.
# VALUE: A data frame with a column for each column in
# data.frame(mat), and a final column of counts appended;
# rows are the UNIQUE rows of mat. Similar result,
# but as a multi-way array, may be obtained with the
# call, do.call("table",as.data.frame(mat)).
#
# ***NOTE***
# This function MAY fail to work correctly if any elements of
# mat are character strings containing white space!
# A possible fix, using an argument, "sep", may be found in
# function match.mat().
# SEE ALSO:
# match.mat, unique.mat
#
nms <- NULL
if(!is.data.frame(mat)) {
nms <- dimnames(mat)[[2]]
mat <- as.data.frame(mat)
}
if(any(sapply(mat, is.matrix))) {
mat <- as.data.frame(as.matrix(mat), optional = T)
if(!is.null(nms))
names(mat) <- nms
}
pmat <- do.call("paste", mat)
which <- !duplicated(pmat)
mat.use <- mat[which, , drop = F]
mat.tab <- table(pmat)
mat.use$Count <- mat.tab[match(pmat[which], names(mat.tab))]
if(order.rows)
mat.use <- mat.use[do.call("order", mat.use), ]
row.names(mat.use) <- paste(1:dim(mat.use)[1])
mat.use
}
I hope you find it useful.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector@stat.berkeley.edu
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news