# Re: [S] Type I error in linear mixed effects model in S

Douglas Bates (bates@stat.wisc.edu)
28 May 1998 21:10:13 -0500

Chul Ahn <chulahn@heart.med.uth.tmc.edu> writes:

> I performed a simulation study using the lme (linear mixed effects routine)
> of S-plus. The simulated data set is based on the split-plot design, which
> is widely used in clinical trials. The form of the data is given below.
>
> However, the type I error is too large in lme. I did not examine the power
> of the lme routine since I was really suprised to see the large type I
> error. Maybe, I made a mistake in the model statement of lme.
> I ran 500 data sets using the following form of data.
> dsn subject group scr1 scr2 scr3 scr4 scr5 scr6 scr7 scr8
> scr9
> 1 1 1 26.0 25.0 24.0 25.7 27.1 26.3 22.8 29.0
> 31.0
> 1 2 1 25.0 23.5 23.1 24.5 26.8 27.1 25.9 28.1
> 28.8
> ...
> 1 50 1 29.0 28.8 25.3 26.7 29.0 24.6 26.7 27.2
> 27.0
> 1 1 2 31.0 24.3 24.8 25.0 26.3 27.1 25.6 26.3
> 24.9
> ...
> 1 50 2 24.5 23.6 25.8 26.3 27.4 25.4 26.5 27.1
> 28.0
> ...
> 500 50 2 25.2 25.7 25.8 25.3 24.0 26.1 26.1 24.9
> 25.8
>
> Here dsn=1 means the data set #1 and dsn=500 means the data set #500. scr1
> is the measure at time 1, scri is the measure at time i. The data is a
> balanced data without any missing values.
>
> For simulation, I converted the above data set into the standard form in
> page 284 of S-plus4 statistics guide.
> (scr1-- scr9 were converted into the variable score. A newly created
> variable 'time' takes the values 1 through 9.)
>
> I used the following S-Plus lme model statements, which are almost
> identical to the models in page 292 of S-Plus4.
>
> (1)
> S>lme(fixed=score ~time*group, random= ~time, cluster= ~subject, data=data)
>
> The empirical type I error of the group effect is 31.8% and the empirical
> type I error of the interaction effect time*group is 9.6%.

I think it is confusing to refer to the "type I error rate" of an
estimation method. Based on other correspondence with you, I assume
what you mean is that you have simulated data according to a null
model, (i.e. no group effect and no group*time interaction but with a
random effect for subject and with an effect for time), fit the model
you describe above including the group effect and the group*time
interaction, then examined the p-values corresponding to these extra
terms in the output. These p-values are provided by the summary
function.

When you say the "type I error of the group effect is 31.8%" I assume
that you are quoting the proportion of the simulations for which the
p-value for this term is less than 5%. It would be helpful to be more

Can you confirm that this is indeed what you meant?

Looking at your data I am not sure it was simulated according to the
model I described. There is very little variability from subject to
subject beyond the variability within subjects. Also, if one plots
the observations from the 6 subjects shown here, there does not seem
to be a trend over time. If there is no time trend then the data
could be analyzed with a simple one-way anova. The F-ratio for a
fixed-effects anova of the response by subject is 0.909 - about what
would be expected if there were no subject effects at all. This leads
me to suspect that have simulated all the data from the same normal
distribution. In that case the distribution of the test statistics
for the fixed effects from lme would not be expected to be correct
because the model being simulated is not the null model for the test.

Did you indeed simulate a hierarchical model with a random effect for
subject and an independent random disturbance term for each
observation or did you only simulate one level of variability? If
it is the second case then your "empirical type I error rates" are
meaningless.

> I think the empirical type I errors are too high. Would you please give me
> the suggestions of the model formulation statements? If my model
> formulation statements are correct, I think we have to be really careful in
> the use of the lme programs in S-Plus. We calculated the empirical type I
> errors using other programs. Some are pretty good. The others are not good.
> However, they are much better than the empirical type I errors of lme.

Could you be more specific about the "other programs", about what you
```--