By the way, as the respondents correctly inferred I am interested in generating the sampled classes, not counts.
Dave Krantz' solution:
To restate the problem, suppose P is an n x k matrix with non-negative
entries whose rows sum to 1; how can one generate a sample of size n,
where the ith element has value 1, 2, ..., k with probabilities given
by the ith row of P?
First convert P to cumulative probabilities within each row.
An efficient way (without external loops) is:
Q <- matrix(cumsum(t(P)) - rep(0:(n-1), rep(k,n)), byrow=T, ncol=k)
Next, generate runif(n) and compare it with each column of Q, summing
the rows of the comparison matrix (and adding 1):
R <- 1 + (Q < runif(n)) %*% rep(1, k)
R is the result you want. This can easily be adapted to give samples
of specified sizes greater than 1 for each multinomial: if Z is the
vector of n sample sizes, replace "Q < runif(n)" by
Q[rep(1:n, Z), ] < runif(sum(Z))
and then make R into a list by using split() (if Z is nonconstant) or
into an n x Z[1] matrix (if Z is constant).
Duncan Murdoch's solution (note: yet another application of the
wonderful and fast approx() function!)
Here's a function I've just written that might be efficient. It
samples from a discrete distribution on the integers 0:(n-1) by the
inverse cdf method. It doesn't do the totals to give you multinomial
counts; if you want those, you'll have to use table() on the result.
rdiscrete <- function(n, p)
{
if(length(n) == 1) {
cp <- c(0, cumsum(p/sum(p)))
x <- 0:length(p)
approx(cp, x, runif(n), "constant")$y
}
else {
p <- as.matrix(p)
if(length(n) != nrow(as.matrix(p)))
stop("Length of n must match rows of p")
cp <- cbind(rep(0, length(n)), t(apply(p, 1, function(
row)
cumsum(row)/sum(row))))
x <- matrix(0:ncol(p), length(n), ncol(cp), byrow = T)
offsets <- matrix(0:(length(n) - 1), length(n),
ncol(cp)
)
approx(as.vector(cp + offsets), as.vector(x),
runif(sum(
n)) + rep(0:(length(n) - 1), n), "constant")$y
}
}
If n is a scalar, it generates n random values from the distribution
defined by a normalized version of p. If n is a vector and p is a
matrix with nrow(p) = length(n), then it generates n[i] values from
the distribution given by row i of p.
The way it handles the matrix case is to concatenate all of the
cumulative sums of each row into a long vector with offsets to make it
an increasing vector. For example, if p is the matrix
0.1 0.5 0.4
0.2 0.2 0.6
then the cumulative sum would be
0.0 0.1 0.6 1.0
1.0 1.2 1.4 2.0
with corresponding values
0 1 2 2
0 1 2 2
An runif value is generated, the appropriate offset for that row is
added, and approx() is used to look up the simulated value.
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news