Re: [S] Large (sparse) matrix is killing me

Arman Maghbouleh (arman@csli.stanford.edu)
Sun, 25 Oct 1998 19:45:44 -0800 (PST)


Last week I asked about existing facilities for handling large sparse
matrices. Alan Zaslavsky kindly pointed me to the data structures he
had used in his plr library for polytomous logistic regression
(available in Statlib).

His data structures are:
> data: *nonzero* values from the sparse matrix, arranged in row-major order
> col.index: the *column* index corresponding to each element of data
> last: the index of the position in the data and col.index vectors corresponding
> to the last nonzero element of each row.
>
> For example, if the matrix is
>
> 35 0 0
> 0 0 91
> 42 11 0
> 0 81 0
>
> then data=(35,91,42,11,81), col.index=(1,3,1,2,2), last=(1,2,4,5)
>
> (Note that the use of row-major ordering is because the typical use of
> this is for calculating X'X or X'WX or X'Y, so we work on one row at a
> time.)
>
> The benefit of this, besides storage efficiency, is that as we work
> on each row we use col.index to pull out nonzero elements, never
> doing any calculations on the zero elements. So the operation count
> of the outer product calculation for one row goes from (# columns)^2
> to (# nonzero entries)^2.

I needed a convenient way to build the matrices one row at a time and
only needed to access only one row at a time, so I ended up
implementing sparse matrices as a list of sparse vectors with each
element of the list corresponding to one row. I also defined sparse
vectors as a list of two elements: vector of values and corresponding
vector of indices.

With a few auxiliary functions I can now do all I need.

## create and multiply large sparse vectors:
## tempvec <- vector.sparse(values=1:3, indices=c(1,4,6), len=10^20)
## tempvec2 <- vector.sparse(1:3,c(1,4,9), 10^20)
## length(tempvec2) == 1e+20
## tempvec%*%tempvec2 == 5
## create sparse matrices from sparse vectors:
## tempMat <- matrix.sparse.row(tempvec, tempvec2)
## dim(tempMat) == c(2,1e+20)
## access rows of sparse matrices in a natural way
## tempMat[1,]%*%tempMat[2,] == 5
## tempMat[3,] <- tempvec
## dim(tempMat) == c(3,1e+20)
## just about nothing else is implemented except nice printing of the above.

This didn't seem to be such a hot topic for this list so I won't use
up bandwidth by posting the code--unless more than three people
ask me for it.

-Arman.
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news