[S] model of proportion: summary of responses

Lorenz Gygax (lgygax@access.unizh.ch)
Wed, 2 Dec 1998 07:51:17 +0100 (MET)

Dear all,

I was asked to summarize to the list. Many thanks to Prof Brian Ripley,
James Pratt and Brian Cade who offered ideas and help!

I was interested in how to correctly model proportions with a "standard"
method. There were the following two things that I proposed:

Firstly, I fed the data into glm (Y ~ X, family= binomial).
Secondly, I transformed Y into Z = log (Y/1-Y). Then I fitted a line by
lm (Z ~ X).

Brian RIPLEY (ripley@stats.ox.ac.uk) pointed out, that glm (..., family=
binomial) does not expect zeros and ones in the response but: "The
`handbook' is wrong. It converts the response to a proportion."

For my two propositions he clarifies: "Other way round. Actually, in your
case the second is least squares and the first is weighted least squares
with weights 1/(p*(1-p))".

Which one is more correct "Depends on how the proportions were measured".

And he advices me: "I would use glm with a quasi model logit link, and an
appropriate variance function (and you will need to work out what
appropriate is)."

Which will also help if there are many 0s and 1s: "log(Y/(1-Y)) will be
Inf or -Inf, and this will not work. A quasi glm() model will work".

James PRATT (jamesp@MOCR.OAPI.COM) proposed the same and pointed out a
reference (which can also be found in VR2):

"Have you tried quasilikelihood? Logistic regression does assume a
binomial distribution for the errors. With quasilikelihood, need only
define the variance function, but need not define a distribution. In
McCullagh & Nelder 2nd edition, they give an example where response is the
percentage of a leaf's area affected by blotch. This is example 9.2.4 on
page 328. (In the 1st ed., the example is 8.6.1 on page 173).

They first use a scaled version of the binomial variance (sigma*mu*(1-mu))
as the assumed variance function (with the logit link function for the
mean). After some residual plots, they settle on using mu^2*(1-mu)^2, due
to scaled binomial variance function is too large at the extremes of 0 and

McCullagh & Nelder 'Generalized Linear Models' Chapman and Hall 1989 (2
ed), 1983 (1st ed)."

Brian CADE (Brian_Cade@usgs.gov) tends towards using a somewhat more
specialized method for dealing with compositional data:

"Perhaps you might want to work with the logratio approach advocated by
Aithchison for compositional data even though your testing and estimating
are focused on only 1 component. Briefly, the logratio approach for a
2-part composition (call them y and z and say y is the component of
interest) would involve either of the following 2 transformations (maximal
invariants I believe):

log(y/z) or log(y/(y*z)^1/2). The denominator in the first formula is
one of the indvidual components whereas in the second formual the
denominator is the geometric mean of the two components."

This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news