1) If imputing missing predictor values, shall I make use of the
criterion as well?
2) Is there a way to invoke transcan/impute (Harrells library hmisc) to
make probabilistic predictions rather than deterministic ones?
Here is an explanation why I ask:
Let's assume we have a model z~x+y and that all xyz be positively
correlated, but we don't know how much.
My basic understanding of missing value imputation is (assuming missing
at random) that missing value imputation should reproduce an unbiased
estimate of variances and covariances of xyz.
(a: fixed value)
Substituting missing x by mean of non-missing(x) should lead to downward
biased estimates of var(xx) and subsequently of var(xy) and also of
var(xz)
(b: random value)
Substituting missing x with (randomly choosen values of non-missing(x)
should estimate var(xx) unbiased, but should underestimate var(xy) and
also var(xz)
(c: transcan)
I tried substituting missing x with transcan/impute using y: imputed x
were a function of y *without* any disturbance term, i.e. while the
non-missing data may have CORxy=0.5 the substituted values have CORxy=1
(or close to), this obviously overstimates var(xy) and, if the criterion z
has been used to impute x's also overestimates var(xz).
So Transcan seems to perform a deterministic imputation using the
conditional *expectation* of x given all other variables. Maybe I'm
wrong, but I whish to have the missing values probabilistically imputed by
values randomly drawn from the conditional *distribution* of x given y and
(because of a. and b,) given z. So here are variations of my questions
again:
1) If imputing missing predictor values, shall I not make use of the
criterion as well?
2) Is there a way to invoke transcan/impute to make probabilistic
predictions rather than deterministic ones? Could I add a disturbance term
on my own? how? What is the state-of-the-art method in S+ for imputing
missing values?
Thank you for any help.
Best regards
Jens Oehlschlaegel
