# Re: [S] QUERY: missing value imputation and transcan/impute

Frank E Harrell Jr (fharrell@virginia.edu)
Tue, 3 Mar 1998 15:32:42 -0500

-----Original Message-----
From: Jens Oehlschlaegel <oehl@psyres-stuttgart.de>
To: S+ list <s-news@wubios.wustl.edu>
Date: Tuesday, March 03, 1998 3:22 PM
Subject: [S] QUERY: missing value imputation and transcan/impute

>
>I have one statistical and one technical question on missing value
>imputation, can anyone help with that?
>
>1) If imputing missing predictor values, shall I make use of the
>criterion as well?
>
>2) Is there a way to invoke transcan/impute (Harrells library hmisc) to
>make probabilistic predictions rather than deterministic ones?

The only easy way is to use e.g. age.i <- impute(age,'random') which will replace NAs
with a random draw (sample with replacement) from non-missings. This will work well
when age is unrelated to other predictors.

>
>
>Here is an explanation why I ask:
>
>Let's assume we have a model z~x+y and that all xyz be positively
>correlated, but we don't know how much.
>
>My basic understanding of missing value imputation is (assuming missing
>at random) that missing value imputation should reproduce an unbiased
>estimate of variances and covariances of xyz.

If you go to extra effort to properly estimate them! See the section on imputation using
the Hmisc library in the document by Alzola & Harrell on our web page for an example
in which the bootstrap is used to estimate the covariances correctly by putting imputation
inside the bootstrap loop. This is not computationally feasible for transcan (customized
imputation models) but I'm working on it.

>
>(a: fixed value)
>Substituting missing x by mean of non-missing(x) should lead to downward
>biased estimates of var(xx) and subsequently of var(xy) and also of
>var(xz)
>
>(b: random value)
>Substituting missing x with (randomly choosen values of non-missing(x)
>should estimate var(xx) unbiased, but should underestimate var(xy) and
>also var(xz)
>
>(c: transcan)
>I tried substituting missing x with transcan/impute using y: imputed x
>were a function of y *without* any disturbance term, i.e. while the
>non-missing data may have CORxy=0.5 the substituted values have CORxy=1
>(or close to), this obviously overstimates var(xy) and, if the criterion z
>has been used to impute x's also overestimates var(xz).
>
>So Transcan seems to perform a deterministic imputation using the
>conditional *expectation* of x given all other variables. Maybe I'm
>wrong, but I whish to have the missing values probabilistically imputed by
>values randomly drawn from the conditional *distribution* of x given y and
>(because of a. and b,) given z. So here are variations of my questions
>again:
>

I think that the main reason people impute individual realizations rather than expected
values is that they are using multiple imputation to get covariance matrices. You need
this kind of variation to make multiple imputation work. If using the bootstrap you can
impute using estimates of expected values and still get the right variances. I think
there may be a slight advantage to imputing "best" in place of "random" estimates but
don't have any formal justification yet.

>1) If imputing missing predictor values, shall I not make use of the
>criterion as well?

Don't understand this question

>
>2) Is there a way to invoke transcan/impute to make probabilistic
>predictions rather than deterministic ones? Could I add a disturbance term
>on my own? how? What is the state-of-the-art method in S+ for imputing
>missing values?

I could add an option to predict.transcan to impute individual predicted values by taking
draws from the residuals instead of using expected values, but I'm not convinced it's
needed if you use the bootstrap to get covariances, as mentioned above.

Good luck - these are good questions -Frank

---------------------------------------------------------------------------
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Director, Division of Biostatistics and Epidemiology
Dept of Health Evaluation Sciences
University of Virginia School of Medicine
http://www.med.virginia.edu/medicine/clinical/hes/biostat.htm

>
>
>Thank you for any help.
>Best regards
>
>
>Jens Oehlschlaegel
>
>
>
>
>
>
>
>--
>Jens Oehlschlaegel-Akiyoshi
>Psychologist/Statistician
>Project TR-EAT + COST Action B6
> F.rankfurt
>oehl@psyres-stuttgart.de A.ttention
>+49 711 6781-408 (phone) I.nventory
>+49 711 6876902 (fax) R .-----.
> / ----- \
>Center for Psychotherapy Research | | 0 0 | |
>Christian-Belser-Strasse 79a | | ? | |
>D-70597 Stuttgart Germany \ ----- /
>-------------------------------------------------- '-----' -
>(general disclaimer) it's better
>
>
>
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message: unsubscribe s-news
>

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news