Re: [S] Logistic regression problems in Standard Version

Iftikhar U. Sikder (ifti@icimod.org.np)
Mon, 19 Oct 1998 10:42:33 +0000


Thanks to Renaud Lancelot for helping to explain the result of
logistic regression report.

> I am new in S+ and using S+ 4.5 Standard version. I am sorry if my
> questions sound too naive. I am trying to develop a logisitc
> regression. The sample data contain large number of records (more
> than 28,000) derived GIS model. The report file of the Logistic
> Regression analysis is attached in this mail. I would like to
> request if someone could help me understand and explain the result.
> In addition, I've couple of questions.

Deviance Residuals:
Min 1Q Median 3Q Max
-3.069086 -0.3280827 -0.02042211 0.6201363 3.144234
***
gives you some indications on the dispersion of residuals. You have to
plot the resdiduals to do that (use resid()) *** Coefficients:
Value Std. Error t value
(Intercept) -8.0842219589 0.63972925231 -12.636943
V2 -0.0210170268 0.00337424122 -6.228668
V3 0.2662767744 0.01661605618 16.025269
V4 -0.0011611856 0.00003287313 -35.323243
V5 -0.0100953342 0.00572956844 -1.761971
V6 0.2424987778 0.01203381433 20.151448
V7 0.0484523502 0.00195516028 24.781779
V8 0.0078187176 0.00062731179 12.463846
V9 0.0098965947 0.00366897305 2.697375
V10 -0.0001520750 0.00012221146 -1.244359
V11 -0.0009061981 0.00016316028 -5.554037
***
gives you parameters values as well as the Wald statistics (t value)
which simply is the ratio of the param to its sts error (e.g.
-8.0842... / 0.6397... = -12.6369). To get the p value, compare this
to a Chi2 with 1 df. You can slightly modify the summary.glm function
to do that or use the matrix returned from summary(coef(my.model)),
e.g. 1 - pchisq(abs(coef(summary(my.model))[,3]), 1).

NB: the returned values for the std. error and t value are computed
for dispersion parameter = 1. You have to check that this is
acceptable, using summary(my.model, disp=0)

The correlation matrix gives you the correlation of coefficients. If
you don't want this to appear (boring with large models), use
summary(my.model, cor=0)

***

Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 28414 39192.14
V2 1 5761.24 28413 33430.90
V3 1 69.72 28412 33361.18
V4 1 10823.09 28411 22538.09
V5 1 293.19 28410 22244.90
V6 1 1482.68 28409 20762.23
V7 1 734.58 28408 20027.65
V8 1 167.32 28407 19860.32
V9 1 10.29 28406 19850.03
V10 1 1.19 28405 19848.84
V11 1 35.12 28404 19813.72

gives you the likelihood ratio statistics: gives you the deviance
reduction resulting from the introduction of each term in the model,
in a sequantial way. It means that you won't have the same results
depending on the introduction order. To get the statistics and the p
value, anova(my.model, test="Chisq"). That's the way you can compare
any two or more nested models. Careful: you have to check that the
Wald test and the LR test give broadly similar results (they're always
different and the LR test is considered as being (much) more
reliable). If it is not the case, it means that you have problems in
your data: collinearities, clustering and other amusements.

You should carefully read the Chambers & Hastie book (statistical
models in S) as well as the VR book (modern applied statistics with
S-Plus).

>
> (1) What is the maximum number of records that can be used?

I don't think there is any maximum (except from the RAM/speed of your
computer)... > > (2) The ANOVA table does not produce Pr(chi) in
standard 4.5 version. How > can I explain the result of the
analysis of deviance for the sequential > addition of each
variable? Or simply, how to identify the critical/significant >
variables?

see above: with the standard version, see "Compare models" option and
check one model with the "Chi squared" statistics

> (3) How to get the pseudo R-square value and what value of pseudo
> R-square
> could be considered ' satisfactory '.

That's a big question ! No ideal answer. For technical details and
computations of pseudo R-square, see Myers R.H., Montgomery D.C.,
1997. A tutorial on generalized linear models. Journal of Quality
Technology, 29 (3) : 274-291. The pseudo R-square value can easily be
extracted from the model: R2.like <- (my.model$null.deviance -
my.model$deviance)/my.model$null.deviance but I don't know with the
standard version. Maybe you have to compute it from the output, which
is not a big deal !

> (4) How is it possible to create box plots of the predictors and
> the
> binary response using the standard version.

what do you mean ? box plots of predicted proba ? or bxpl for each
term of the model ?

>
> (5) Has anyone used step wise logistic regression in standard
> version?

No but you have to be very careful. The step.glm function is known to
give strange results (sometimes...).

My advice: get the PROFESSIONAL version and use the MASS library,
where you can find Brian Ripley's stepAIC which is really great for
stepwise logistic regression.

Hope this helps and best regards,

Renaud

>
> Thanks
>
> I.Sikder
>
> --------------------------------------------------------------------
> -- This message was distributed by s-news@wubios.wustl.edu. To
> unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the
> BODY of the message: unsubscribe s-news

-- 
Renaud Lancelot
ISRA-LNERV
BP 2057 Dakar-Hann
Senegal

tel (221) 832 49 02 fax (221) 821 18 79 email renaud.lancelot@cirad.fr Iftikhar U. Sikder Research Associate International Centre for Integrated Mountain Development (ICIMOD) GPO Box: 3226 Kathmandu, Nepal Phone: 977-1-525313 (O) Phone: 977-1-536141 (R) Fax : 977-1-524509/536747

/\ /\ /\ /\ / \ / \ / \ / \ / \/ \/ \/ \ *^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^*^* Disclaimer: Any opinions given here are my own and not necessarily those of the ICIMOD ----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news