There is a description in V&R2, pp. 237-8., given below. I guess I was
teasing people to look up Hauck-Donner phenomenon in our index.
(I seem to remember this was new to my co-author too, so you were in
good company. This is why it is such a good example of a fact which
would be useful to know but hardly anyone does. Don't ask me how I
knew: I only know that I first saw this in about 1980.)
There is a little-known phenomenon for binomial GLMs that was pointed
out by Hauck & Donner (1977: JASA 72:851-3). The standard errors and
t values derive from the Wald approximation to the log-likelihood,
obtained by expanding the log-likelihood in a second-order Taylor
expansion at the maximum likelihood estimates. If there are some
\hat\beta_i which are large, the curvature of the log-likelihood at
\hat{\vec{\beta}} can be much less than near \beta_i = 0, and so the
Wald approximation underestimates the change in log-likelihood on
setting \beta_i = 0. This happens in such a way that as |\hat\beta_i|
\to \infty, the t statistic tends to zero. Thus highly significant
coefficients according to the likelihood ratio test may have
non-significant t ratios.
To expand a little, if |t| is small it can EITHER mean than the Taylor
expansion works and hence the likelihood ratio statistic is small OR
that |\hat\beta_i| is very large, the approximation is poor and the
likelihood ratio statistic is large. (I was using `significant' as
meaning practically important.) But we can only tell if |\hat\beta_i|
is large by looking at the curvature at \beta_i=0, not at
|\hat\beta_i|. This really does happen: from later on in V&R2:
There is one fairly common circumstance in which both convergence
problems and the Hauck-Donner phenomenon (and trouble with
\sfn{step}) can occur. This is when the fitted probabilities
are extremely close to zero or one. Consider a medical diagnosis
problem with thousands of cases and around fifty binary
explanatory variables (which may arise from coding fewer
categorical factors); one of these indicators is rarely true but
always indicates that the disease is present. Then the
fitted probabilities of cases with that indicator should be one,
which can only be achieved by taking \hat\beta_i = \infty.
The result from \sfn{glm} will be warnings and an estimated
coefficient of around +/- 10 [and an insignificant t value].
That was based on a real-life example, which prompted me to write what
is now stepAIC. Once I had that to try, I found lots of examples.
Brian Ripley