# [S] QUERY: missing value imputation and transcan/impute

Jens Oehlschlaegel (oehl@Psyres-Stuttgart.DE)
Tue, 3 Mar 1998 21:15:01 +0100 (MET)

I have one statistical and one technical question on missing value
imputation, can anyone help with that?

1) If imputing missing predictor values, shall I make use of the
criterion as well?

2) Is there a way to invoke transcan/impute (Harrells library hmisc) to
make probabilistic predictions rather than deterministic ones?

Here is an explanation why I ask:

Let's assume we have a model z~x+y and that all xyz be positively
correlated, but we don't know how much.

My basic understanding of missing value imputation is (assuming missing
at random) that missing value imputation should reproduce an unbiased
estimate of variances and covariances of xyz.

(a: fixed value)
Substituting missing x by mean of non-missing(x) should lead to downward
biased estimates of var(xx) and subsequently of var(xy) and also of
var(xz)

(b: random value)
Substituting missing x with (randomly choosen values of non-missing(x)
should estimate var(xx) unbiased, but should underestimate var(xy) and
also var(xz)

(c: transcan)
I tried substituting missing x with transcan/impute using y: imputed x
were a function of y *without* any disturbance term, i.e. while the
non-missing data may have CORxy=0.5 the substituted values have CORxy=1
(or close to), this obviously overstimates var(xy) and, if the criterion z
has been used to impute x's also overestimates var(xz).

So Transcan seems to perform a deterministic imputation using the
conditional *expectation* of x given all other variables. Maybe I'm
wrong, but I whish to have the missing values probabilistically imputed by
values randomly drawn from the conditional *distribution* of x given y and
(because of a. and b,) given z. So here are variations of my questions
again:

1) If imputing missing predictor values, shall I not make use of the
criterion as well?

2) Is there a way to invoke transcan/impute to make probabilistic
predictions rather than deterministic ones? Could I add a disturbance term
on my own? how? What is the state-of-the-art method in S+ for imputing
missing values?

Thank you for any help.
Best regards

Jens Oehlschlaegel

```--
Jens Oehlschlaegel-Akiyoshi
Psychologist/Statistician
Project TR-EAT + COST Action B6
F.rankfurt
oehl@psyres-stuttgart.de                         A.ttention
+49 711 6781-408 (phone)                         I.nventory
+49 711 6876902  (fax)                           R .-----.
/ ----- \
Center for Psychotherapy Research                | | 0 0 | |
Christian-Belser-Strasse 79a                     | |  ?  | |
D-70597 Stuttgart Germany                         \ ----- /
-------------------------------------------------- '-----' -
(general disclaimer)                             it's better

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news
```