[S] RE: Interpreting Logistic regression - CORRECTION to Wald Statistic

andrew_white@hmsa.com
Tue, 27 Oct 1998 10:06:29 -0800


I have read through the recent snews message on interpreting logistic
regression and found it very cogent. However I also found an error in
designation of the Wald statistic based on t-values appearing in
summary(glm.object.fit).

Please note one error in the comments that Renaud Lancelot provided in
[snews subject "Re: [S] Logistic regression problems in Standard Version",
from: siftikhar sikder, date: october 18, 1998].

In interpreting the significance of the individual coefficients using the
t-values supplied in the summary(glm.object.fit) function, you must treat
the SQUARE of these t-values as asymptotically Chi-Squared with 1 df, hence
can test against, e.g., the critical value of 3.84 for 5% confidence level.

Do NOT apply the t-value straight as the Chi-Squared value. The t-value
reflects a Wald statistic indirectly, and there is S-News prior messages on
the meaning of Wald statistics, as well as how to be VERY CAUTIOUS about
Wald statistics in logistic regression (due to the obscure "Hauck-Donner
effect") when the Beta coefficients are very large. But the t-value straight
off is an approximation to a z-score, and better treated by SQUARING it to
approximate a Chi-Square value.

Check with Kent Holsinger on Logit regression dated 8 May 1998 as well as
Trevor Hastie messages dated 24 April 1997 in back s-news messages; also
check the text by Ludwig Fahrmeir & Gerhard Tutz, Multivariate Statistical
Modeling Based on Generalized Linear Models, Springer-Verlag, 1994,
pp.45-48; specifically middle of page 46: "... the Wald statistic is the
square of the 't-value' the standardized estimate ..." and bottom: "In
particular p-values corresponding to the squared t-values ... of effects ..
are computed from the Chi-Squared (df 1) distribution." (pp.46-47).

The upshot is to test a t-value as a Wald statistic by squaring it and
comparing to the Chi-Square critical value for a confidence level. To
calculate the p-value directly, the S-Plus code is (correcting Renaud's code
in your snews summary):
1-pchisq(abs(coef(summary(my.model))[,3]^2),1)

With regards and Aloha from Hawaii
Andy White

-----Original Message-----
From: owner-s-news@wubios.wustl.edu
[mailto:owner-s-news@wubios.wustl.edu]On Behalf Of Iftikhar U. Sikder
Sent: Monday, October 19, 1998 3:43 AM
To: s-news@wubios.wustl.edu
Subject: Re: [S] Logistic regression problems in Standard Version

Thanks to Renaud Lancelot for helping to explain the result of
logistic regression report.

> I am new in S+ and using S+ 4.5 Standard version. I am sorry if my
> questions sound too naive. I am trying to develop a logisitc
> regression. The sample data contain large number of records (more
> than 28,000) derived GIS model. The report file of the Logistic
> Regression analysis is attached in this mail. I would like to
> request if someone could help me understand and explain the result.
> In addition, I've couple of questions.

Deviance Residuals:
Min 1Q Median 3Q Max
-3.069086 -0.3280827 -0.02042211 0.6201363 3.144234
***
gives you some indications on the dispersion of residuals. You have to
plot the resdiduals to do that (use resid()) *** Coefficients:
Value Std. Error t value
(Intercept) -8.0842219589 0.63972925231 -12.636943
V2 -0.0210170268 0.00337424122 -6.228668
V3 0.2662767744 0.01661605618 16.025269
V4 -0.0011611856 0.00003287313 -35.323243
V5 -0.0100953342 0.00572956844 -1.761971
V6 0.2424987778 0.01203381433 20.151448
V7 0.0484523502 0.00195516028 24.781779
V8 0.0078187176 0.00062731179 12.463846
V9 0.0098965947 0.00366897305 2.697375
V10 -0.0001520750 0.00012221146 -1.244359
V11 -0.0009061981 0.00016316028 -5.554037
***
gives you parameters values as well as the Wald statistics (t value)
which simply is the ratio of the param to its sts error (e.g.
-8.0842... / 0.6397... = -12.6369). To get the p value, compare this
to a Chi2 with 1 df. You can slightly modify the summary.glm function
to do that or use the matrix returned from summary(coef(my.model)),
e.g. 1 - pchisq(abs(coef(summary(my.model))[,3]), 1).

NB: the returned values for the std. error and t value are computed
for dispersion parameter = 1. You have to check that this is
acceptable, using summary(my.model, disp=0)

The correlation matrix gives you the correlation of coefficients. If
you don't want this to appear (boring with large models), use
summary(my.model, cor=0)

***

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 28414 39192.14
V2 1 5761.24 28413 33430.90
V3 1 69.72 28412 33361.18
V4 1 10823.09 28411 22538.09
V5 1 293.19 28410 22244.90
V6 1 1482.68 28409 20762.23
V7 1 734.58 28408 20027.65
V8 1 167.32 28407 19860.32
V9 1 10.29 28406 19850.03
V10 1 1.19 28405 19848.84
V11 1 35.12 28404 19813.72

gives you the likelihood ratio statistics: gives you the deviance
reduction resulting from the introduction of each term in the model,
in a sequantial way. It means that you won't have the same results
depending on the introduction order. To get the statistics and the p
value, anova(my.model, test="Chisq"). That's the way you can compare
any two or more nested models. Careful: you have to check that the
Wald test and the LR test give broadly similar results (they're always
different and the LR test is considered as being (much) more
reliable). If it is not the case, it means that you have problems in
your data: collinearities, clustering and other amusements.

You should carefully read the Chambers & Hastie book (statistical
models in S) as well as the VR book (modern applied statistics with
S-Plus).

>
> (1) What is the maximum number of records that can be used?

I don't think there is any maximum (except from the RAM/speed of your
computer)... > > (2) The ANOVA table does not produce Pr(chi) in
standard 4.5 version. How > can I explain the result of the
analysis of deviance for the sequential > addition of each
variable? Or simply, how to identify the critical/significant >
variables?

see above: with the standard version, see "Compare models" option and
check one model with the "Chi squared" statistics

> (3) How to get the pseudo R-square value and what value of pseudo
> R-square
> could be considered ' satisfactory '.

That's a big question ! No ideal answer. For technical details and
computations of pseudo R-square, see Myers R.H., Montgomery D.C.,
1997. A tutorial on generalized linear models. Journal of Quality
Technology, 29 (3) : 274-291. The pseudo R-square value can easily be
extracted from the model: R2.like <- (my.model$null.deviance -
my.model$deviance)/my.model$null.deviance but I don't know with the
standard version. Maybe you have to compute it from the output, which
is not a big deal !

> (4) How is it possible to create box plots of the predictors and
> the
> binary response using the standard version.

what do you mean ? box plots of predicted proba ? or bxpl for each
term of the model ?

>
> (5) Has anyone used step wise logistic regression in standard
> version?

No but you have to be very careful. The step.glm function is known to
give strange results (sometimes...).

My advice: get the PROFESSIONAL version and use the MASS library,
where you can find Brian Ripley's stepAIC which is really great for
stepwise logistic regression.

Hope this helps and best regards,

Renaud

>
> Thanks
>
> I.Sikder
>
> --------------------------------------------------------------------
> -- This message was distributed by s-news@wubios.wustl.edu. To
> unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the
> BODY of the message: unsubscribe s-news

--
Renaud Lancelot
ISRA-LNERV
BP 2057 Dakar-Hann
Senegal

tel (221) 832 49 02 fax (221) 821 18 79 email renaud.lancelot@cirad.fr Iftikhar U. Sikder Research Associate International Centre for Integrated Mountain Development (ICIMOD) GPO Box: 3226 Kathmandu, Nepal Phone: 977-1-525313 (O) Phone: 977-1-536141 (R) Fax : 977-1-524509/536747

/\ /\ /\ /\ / \ / \ / \ / \ / \/ \/ \/ \ *^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* Disclaimer: Any opinions given here are my own and not necessarily those of the ICIMOD ----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news

----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news