Before considering multiple imputation, I really need to figure out what
the missingness pattern is. I've never confronted a dataset quite this
sparse before (some measurements are missing on as many as 90 of 151 cases
while other variables are missing on only a few cases), and the number of
permutations of cases with missing and non-missing variables is quite large.
I'd like to know (1) which cases have no missing variables (the easy one),
(2) which cases have all but variable 1, all but variable 2, etc,
(relatively trivial to do) (3) which cases have all but variables 1 & 2,
1& 3, 1&4, etc. (now things get a bit more complicated) In other words,
there will come a point where I will want to know what the "optimal" data
set might look like - the set with the largest number of variables and
cases. I'll also be interested in the ordered set of less optimal subsets
ranging from most to least complete.
Only after I get this information can I make any rational decision about
what and how much needs imputation.
I've never had to deal with a problem like this before. Any [S or
otherwise] suggestions on how to approach it efficiently would be welcomed.
Thanks.
Dr. Marc R. Feldesman
email: feldesmanm@pdx.edu
email: feldesman@ibm.net
fax: 503-725-3905
"Don't know where I'm goin'
Don't like where I've been
There may be no exit
But, hell I'm goin' in" Jimmy Buffett
Powered by: Monstrochoerus - the 300 MHz Pentium II
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news