Re: Summary of Robust Regression Algorithms

Charles C. Berry (cberry@tajo.ucsd.edu)
Wed, 07 Jan 1998 17:23:01 -0800


Kent E. Holsinger wrote:
>
> >>>>> "Brian" == Prof Brian Ripley <ripley@stats.ox.ac.uk> writes:
>
> Brian> My best example of this not knowing the literature is the
> Brian> Hauck-Donner (1977) phenomenon: a small t-value in a
> Brian> logistic regression indicates either an insignificant OR a
> Brian> very significant effect, but step.glm assumes the first,
> Brian> and I bet few users of glm() stop to think.
>
> All right I confess. This is a new one for me. Could some one explain
> the Hauck-Donner effect to me? I understand that the t-values from
> glm() are a Wald approximation and may not be terribly reliable, but I
> don't understand how a small t-value could indicate "either an
> insignificant OR a very significant effect."

Here is an example.

Consider the dataset:

y x
[1,] 0 -15
[2,] 0 -15
[3,] 0 -15
[4,] 0 -15
[5,] 0 -15
[6,] 0 -15
.
.
.
[99,] 0 -1
[100,] 0 1
[101,] 1 -1
[102,] 1 1
.
.
.
[199,] 1 15
[200,] 1 15

try fitting this with:

summary(glm(y ~ x,
family=binomial,
control=glm.control(maxit=25)))

Notice the coefficients and standard errors:

Value Std. Error t value
(Intercept) 0.00 1.02 0.00
x 0.57 0.33 1.75

The t-value gives a p-value of .08 (two-tailed)

However, intuition should suggest that the 'p-value' is much too large.

And the likelihood ratio test would support that intuition. The
chi-square statistic on 1 degree of freedom is >270:

Null Deviance: 277.2589 on 199 degrees of freedom

Residual Deviance: 5.941614 on 198 degrees of freedom

The Wald test uses the Fisher Information, the curvature at the MLE,
which isn't very big.

It may help your intuition to notice that p.i*(1-p.i) shows up in the
Fisher Information ( where p.i <- predict(..., type="resp") is the
predicted probability of success) and that sum(p.i*(1-p.i)) == 0.959692.
and .922601 of this is due to observations 99:102.

So the Fisher Information doesn't really depend on observations in which
the classification is nearly certain.

And you expect most observations to be classified with near certainty if
the effects are really large.

-- 

Charles C. Berry (619) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry@tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0622