Table of Contents

 

Handling Missing Data

CASE STUDY - GENESIS

 

GENESIS- Generalized System for Imputation Simulations

GENESIS is a system that allows one to perform simulations under imputation for missing data.  The user provides a data set representing the "true" population.  The simulation consists of selecting simple random samples without replacement and generating nonresponse according to a specified response mechanism.  The simulation will produce estimates of the population mean of the variable of interest. The user selects a response rate and an imputation method.  The user will also have to set the number of iterations. 

The simulation produces several tables and estimates that can be used to analyze the impact of nonresponse.  For example, GENESIS provides the Monte Carlo relative bias of the point estimate as well as the Monte Carlo mean squared error of the estimator, see below for details. 

 

 

STEPS

Step One

Install GENESIS

 

 

Step Two

Select the data set, you can use the NPHS dummy files provided or your own data set

Notes: the data set needs to be a SAS© Release 8.2 data set and complete, the system will generate item non-response according to different mechanisms.

 

 

Step Three

Select a variable of interest, for example the HUI.  This is the variable from which nonresponse will be generated.

 

Step Four

Select auxiliary variables if you are using ratio imputation (one variable) or up to four variables if you are using regression or nearest neighbour imputation.

 

Step Five 

Click on the non-response tab.

 

Step Six

6.        Choose a response mechanism:

  1. MCAR

  2. MAR - choose up to four auxiliary variables on which the probability of response will depend (a is the lower bound for the probability of and it is such that 0 < a < 1. This ensures that the probability of response is always α.

  3. NMAR - the probability of response depends on  variable of interest

Step Seven

Choose a probability of response (between 0 and 1).

 

Step Eight

Select the imputation method (see Table 1 in exercise 2):

a.    Mean

b.    Ratio

c.    Hot Deck

d.    Nearest Neighbour

e.    Regression

Step Nine

Select the sample size (vary the sample size).

 

Step Ten

Select the number of iterations (5000 should be sufficient).

 

Step Eleven

It is not necessary to consider the variance estimation methods.

 

Step Twelve

Run the simulation.

 

Step Thirteen

Click on RESULTS and then the Bias Info button

 

 

Step Fourteen 

Click on Variance

 

 

 

Step Fifteen

Click on HISTOGRAMS  then  on Imputed Estimator

 

 

 

Notation 

Variable of interest: y

 

Population mean:

 

 

Imputed estimator:

 

 

Population variance  (see point 14 above)

 

Sample variance after imputation  (see point 14 above)

 

where where is the imputed value for missing yi (see Table 1 in exercise 2)

 

 

Simulation: Monte Carlo results

 

See Points 13 and 14 above.

 

Monte Carlo Relative Bias of the imputed estimator (MCRelBias)

 

Let  then MCRelBias

 

Monte Carlo variance after imputation

 

 

Monte Carlo MSE of the imputed estimator (MCMSE)

 

Let  then MCMSE