Re: [S] Products of pairs + something completely different

Steve Bousquin (sbous@lamar.colostate.edu)
Tue, 31 Mar 1998 09:30:43 -0700


Dr. Venables may be expecting too much if he is expecting my questions
always to make sense! The question was (a) not properly formulated (I
wanted a vector of the outer products, which some repondants
anticipated); and (b) probably not the correct question for what I
wanted to do. I would have to classify my programming ability in S-PLUS
as rank beginner. I am also not a statistician. I have written my
first S-PLUS functions in the last few weeks with the help of this
amazing newsgroup (I have had as many as 30 gracious responses to a
single question).

It appears, though, that I set Dr. Venables off on an interesting
diversion. To satisfy his curiousity on my question (and perhaps,
hopefully, to intitate another diversion):

The question was related to a misguided attempt to write a function that
would calculate the quantity below. It is the variance of the mean
for an adaptive cluster sample, and my resulting function (not included
here) did not produce sensible results. My guess is that I did not
address the double summation correctly in my S-PLUS function, although I
have not had time to work it out.

Here is the quantity I was trying to calculate. I would welcome any
advice on how to do this in S-PLUS. The yk*yk' term is what prompted my
post. My own functions to calculate the alpha k and alpha k k' terms
work
fine and could be called in the final function:

var(mu) = (1/M^2*N^2) sigma(k=1 to K) sigma(k'=1 to K)
((yk*yk')/(alpha k k')) * ((alpha k k'/alpha k * alpha k')-1)

where K is the number of networks (see below), k is a network, alpha k
k' is the probability that a primary unit intercepts both networks k and
k', yk is the sum of the y-values in the kth network, M is the number of
possible transects in the population, and N is the number of secondary
units (see below) in a transect (Thompson 1996). The second letters in
my notation (on y and alpha) are subscripts in the original.

(There are 2 kinds of sampling units in adaptive cluster sampling. The
primary unit in this case is a strip transect. The secondary unit is a
plot which has the same dimension on a side as the width of the
transect. A series of plots is extended from the transect when the
object of interest is intercepted by the transect. Plots are put down
adjacent to one another until an
adjacent one does not contain the object of interest. A network is the
collection of plots resulting from one interception).

Thompson, S. K., and G. A. F. Seber.1996. Adaptive sampling.
Wiley-Interscience, New York.

Bill Venables wrote:
>
> Scott Chasalow writes:
> > On Friday, March 27, 1998 (Steve Bousquin) asks
> > >
> > > Is there a simple way to obtain the products of all possible
> > > pairs of values that are in a data frame column?
> > >
> > > Steve Bousquin
> >
> > ....
> > > z <- 1:10
> > > which <- combn2(n = length(z))
> > > z[which[,1]] * z[which[,2]]
> > [1] 2 3 4 5 6 7 8 9 10 6 8 10 12 14 16 18 20 12 15
> > 18 21 24 27 30 20
> > [26] 24 28 32 36 40 30 35 40 45 50 42 48 54 60 56 63 70 72 80
> > 90
> > ...
>
> Of course I can't resist an obfuscated, if still obvious one-liner:
>
> > z <- 1:10
> > (function(x) x[lower.tri(x)])(outer(z,z))
> [1] 2 3 4 5 6 7 8 9 10 6 8 10 12 14 16 18 20 12 15 18 21 24
> [23] 27 30 20 24 28 32 36 40 30 35 40 45 50 42 48 54 60 56 63 70 72 80
> [45] 90
>
> What really intrigued me about this question, though, was why
> anybody would ever need to do it. Then it occurred to me that
> perhaps the question was not properly expressed (why the
> reference to a data frame?) and perhaps what Steve B. was really
> after was a way of forming all possible products of variables in
> a data frame, as you might need, for example, for a second degree
> regression.
>
> In any case that is a much more interesting question, and the
> annoying thing about it is that you can *nearly* do it very
> trivially, but not quite...
>
> For example, if you have a data frame, dat, with variables y, x1,
> x2, x3, and x4, all quantitative. Then the model fit
>
> fm <- lm(y ~ (x1 + x2 + x3 + x4)^2, data = dat)
>
> generates a quadratic regression in all variables, apart from the
> second degree powers, I(x1^2), I(x2^2), &c which need to be
> explicitly generated and added. Why?
>
> It always seemed very odd to me that the ^ operator in linear
> model formulae could not be used directly for such a common,
> useful and entirely natural operation. (In fact by some sort of
> special indulgence it *does* work with one variable. A term like
> + x1^2 is silently promoted to mean + I(x1^2). If only they had
> not stopped at one variable!)
>
> The simple rule could be that all possible products of the
> required degree be generated and the factor powers be demoted to
> first degree, either when the formula is parsed by terms() if the
> class of the variables is then known, or later when the model
> matrix is constructed. In fact I don't think it would seriously
> break any existing code if the rules concerning the ^ operator
> were now changed to what is obviously the true, logical and
> divinely inspired convention...
>
> Charles R. if you are on the lookout for nice little enhancements
> to offer to users, please take note.
>
> Bill.
>
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message: unsubscribe s-news

-- 

============================================== Steve Bousquin

Colorado State University Department of Rangeland Ecosystem Science/GDPE NR209 Fort Collins, CO 80523 USA

sbous@lamar.colostate.edu ============================================== ----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news