Re: [S] Newbie question: Extracting rows of a matrix with positive elements

Dave Krantz (dhk@paradox.psych.columbia.edu)
Sat, 21 Feb 1998 07:15:27 -0500


The first question is, how does one extract those rows of a matrix
that have all positive elements?

A method that will work, on a matrix denoted y, is

y[(((y > 0) - 1) %*% rep(1, ncol(y))) == 0, ].

Perhaps someone else will come up with an easier way.

The second question is, how is a "newbie" supposed to figure
that out? Where can one find it? In response, I will say that
I have been using S since about 1981 (long before Splus) and
I don't recall ever having to select the all-positive rows of
a matrix, therefore, I'm not sure that this is the sort of basic
question whose answer should be readily available in documentation.
I have had to select rows based on many different criteria, however,
so devising a way to select rows efficiently is a familiar problem,
one which SHOULD be covered clearly in the documentation.

To develop the method used above, I first reasoned that we want
a selection vector w that could be used in an expression of form

y[w, ];

the requirement would be that w == T iff the corresponding row
(at that index position) is all-positive.

Working from the opposite end, we can test a whole matrix for
positivity by forming

y > 0

which produces a matrix of T and F values. Thus, one way to get
the desired vector w would be to take the product of each row
of (y > 0); this will be T iff the values are all T. This could be
accomplished by apply() with prod(), i.e.,

w <- apply(y > 0, 1, prod).

This produces a vector of 0s and 1s, which can be coerced to Fs and Ts
by as.logical(). Thus, a possible solution to the original problem is

y[as.logical(apply(y > 0, 1, prod)), ].

This will work well much of the time, but apply() can be slow when
the number of rows is large, and this can result in even greater
slowness when the computation itself is repeated within a large
loop. So I always look for ways to use purely linear operations
to achieve the desired result. It occurred to me that if we subtract
1 from the T and F values in (y > 0), we get a matrix that is all
0 in the desired rows, with a mixture of 0 and -1 in other rows,
and thus I can use the row sum == 0 rather than the row product == 1
as the selection criterion. And row sums are obtainable efficiently
by matrix multiplication by a vector of 1s of the right length.
This led to the solution given earlier, which ought to be quite fast
even for very large matrices.

It has taken me a while to write this message, which takes such a
didactic tone because the "newbie" label pushed some buttons;
but once one has enough experience with this sort of problem,
one can invent the basic solutions shown here--using prod()
and then using matrix multiplication--quite rapidly. Then, of
course, one has to test the ideas to make sure they give the desired
results!! I am seldom completely sure of anything until I try it.

Dave Krantz
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news