class statsmodels.imputation.mice.MICEData(data, perturbation_method='gaussian', k_pmm=20, history_callback=None)
[source]
Wrap a data set to allow missing data handling with MICE.
Parameters: 


Draw 20 imputations from a data set called data
and save them in separate files with filename pattern dataXX.csv
. The variables other than x1
are imputed using linear models fit with OLS, with mean structures containing main effects of all other variables in data
. The variable named x1
has a condtional mean structure that includes an additional term for x2^2.
>>> imp = mice.MICEData(data) >>> imp.set_imputer('x1', formula='x2 + np.square(x2) + x3') >>> for j in range(20): ... imp.update_all() ... imp.data.to_csv('data%02d.csv' % j)
Impute using default models, using the MICEData object as an iterator.
>>> imp = mice.MICEData(data) >>> j = 0 >>> for data in imp: ... imp.data.to_csv('data%02d.csv' % j) ... j += 1
Allowed perturbation methods are ‘gaussian’ (the model parameters are set to a draw from the Gaussian approximation to the posterior distribution), and ‘boot’ (the model parameters are set to the estimated values obtained when fitting a bootstrapped version of the data set).
history_callback
can be implemented to have side effects such as saving the current imputed data set to disk.
get_fitting_data (vname)  Return the data needed to fit a model for imputation. 
get_split_data (vname)  Return endog and exog for imputation of a given variable. 
impute (vname)  
impute_pmm (vname)  Use predictive mean matching to impute missing values. 
next_sample ()  Returns the next imputed dataset in the imputation process. 
perturb_params (vname)  
plot_bivariate (col1_name, col2_name[, …])  Plot observed and imputed values for two variables. 
plot_fit_obs (col_name[, lowess_args, …])  Plot fitted versus imputed or observed values as a scatterplot. 
plot_imputed_hist (col_name[, ax, …])  Display imputed values for one variable as a histogram. 
plot_missing_pattern ([ax, row_order, …])  Generate an image showing the missing data pattern. 
set_imputer (endog_name[, formula, …])  Specify the imputation process for a single variable. 
update (vname)  Impute missing values for a single variable. 
update_all ([n_iter])  Perform a specified number of MICE iterations. 
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.imputation.mice.MICEData.html