re: [S] Products of pairs + something completely different

Bill Venables (wvenable@attunga.stats.adelaide.edu.au)
Tue, 31 Mar 1998 09:16:40 +0930


Scott Chasalow writes:
> On Friday, March 27, 1998 (Steve Bousquin) asks
> >
> > Is there a simple way to obtain the products of all possible
> > pairs of values that are in a data frame column?
> >
> > Steve Bousquin
>
> ....
> > z <- 1:10
> > which <- combn2(n = length(z))
> > z[which[,1]] * z[which[,2]]
> [1] 2 3 4 5 6 7 8 9 10 6 8 10 12 14 16 18 20 12 15
> 18 21 24 27 30 20
> [26] 24 28 32 36 40 30 35 40 45 50 42 48 54 60 56 63 70 72 80
> 90
> ...

Of course I can't resist an obfuscated, if still obvious one-liner:

> z <- 1:10
> (function(x) x[lower.tri(x)])(outer(z,z))
[1] 2 3 4 5 6 7 8 9 10 6 8 10 12 14 16 18 20 12 15 18 21 24
[23] 27 30 20 24 28 32 36 40 30 35 40 45 50 42 48 54 60 56 63 70 72 80
[45] 90

What really intrigued me about this question, though, was why
anybody would ever need to do it. Then it occurred to me that
perhaps the question was not properly expressed (why the
reference to a data frame?) and perhaps what Steve B. was really
after was a way of forming all possible products of variables in
a data frame, as you might need, for example, for a second degree
regression.

In any case that is a much more interesting question, and the
annoying thing about it is that you can *nearly* do it very
trivially, but not quite...

For example, if you have a data frame, dat, with variables y, x1,
x2, x3, and x4, all quantitative. Then the model fit

fm <- lm(y ~ (x1 + x2 + x3 + x4)^2, data = dat)

generates a quadratic regression in all variables, apart from the
second degree powers, I(x1^2), I(x2^2), &c which need to be
explicitly generated and added. Why?

It always seemed very odd to me that the ^ operator in linear
model formulae could not be used directly for such a common,
useful and entirely natural operation. (In fact by some sort of
special indulgence it *does* work with one variable. A term like
+ x1^2 is silently promoted to mean + I(x1^2). If only they had
not stopped at one variable!)

The simple rule could be that all possible products of the
required degree be generated and the factor powers be demoted to
first degree, either when the formula is parsed by terms() if the
class of the variables is then known, or later when the model
matrix is constructed. In fact I don't think it would seriously
break any existing code if the rules concerning the ^ operator
were now changed to what is obviously the true, logical and
divinely inspired convention...

Charles R. if you are on the lookout for nice little enhancements
to offer to users, please take note.

Bill.

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news