# Re: [S] Large (sparse) matrix is killing me

Arman Maghbouleh (arman@csli.stanford.edu)
Sun, 25 Oct 1998 19:45:44 -0800 (PST)

matrices. Alan Zaslavsky kindly pointed me to the data structures he
had used in his plr library for polytomous logistic regression
(available in Statlib).

His data structures are:
> data: *nonzero* values from the sparse matrix, arranged in row-major order
> col.index: the *column* index corresponding to each element of data
> last: the index of the position in the data and col.index vectors corresponding
> to the last nonzero element of each row.
>
> For example, if the matrix is
>
> 35 0 0
> 0 0 91
> 42 11 0
> 0 81 0
>
> then data=(35,91,42,11,81), col.index=(1,3,1,2,2), last=(1,2,4,5)
>
> (Note that the use of row-major ordering is because the typical use of
> this is for calculating X'X or X'WX or X'Y, so we work on one row at a
> time.)
>
> The benefit of this, besides storage efficiency, is that as we work
> on each row we use col.index to pull out nonzero elements, never
> doing any calculations on the zero elements. So the operation count
> of the outer product calculation for one row goes from (# columns)^2
> to (# nonzero entries)^2.

I needed a convenient way to build the matrices one row at a time and
only needed to access only one row at a time, so I ended up
implementing sparse matrices as a list of sparse vectors with each
element of the list corresponding to one row. I also defined sparse
vectors as a list of two elements: vector of values and corresponding
vector of indices.

With a few auxiliary functions I can now do all I need.

## create and multiply large sparse vectors:
## tempvec <- vector.sparse(values=1:3, indices=c(1,4,6), len=10^20)
## tempvec2 <- vector.sparse(1:3,c(1,4,9), 10^20)
## length(tempvec2) == 1e+20
## tempvec%*%tempvec2 == 5
## create sparse matrices from sparse vectors:
## tempMat <- matrix.sparse.row(tempvec, tempvec2)
## dim(tempMat) == c(2,1e+20)
## access rows of sparse matrices in a natural way
## tempMat[1,]%*%tempMat[2,] == 5
## tempMat[3,] <- tempvec
## dim(tempMat) == c(3,1e+20)
## just about nothing else is implemented except nice printing of the above.

This didn't seem to be such a hot topic for this list so I won't use
up bandwidth by posting the code--unless more than three people