class statsmodels.imputation.mice.MICEData(data, perturbation_method='gaussian', k_pmm=20, history_callback=None)
[source]
Wrap a data set to allow missing data handling with MICE.
Parameters: |
|
---|
Draw 20 imputations from a data set called data
and save them in separate files with filename pattern dataXX.csv
. The variables other than x1
are imputed using linear models fit with OLS, with mean structures containing main effects of all other variables in data
. The variable named x1
has a condtional mean structure that includes an additional term for x2^2.
>>> imp = mice.MICEData(data) >>> imp.set_imputer('x1', formula='x2 + np.square(x2) + x3') >>> for j in range(20): ... imp.update_all() ... imp.data.to_csv('data%02d.csv' % j)
Impute using default models, using the MICEData object as an iterator.
>>> imp = mice.MICEData(data) >>> j = 0 >>> for data in imp: ... imp.data.to_csv('data%02d.csv' % j) ... j += 1
Allowed perturbation methods are ‘gaussian’ (the model parameters are set to a draw from the Gaussian approximation to the posterior distribution), and ‘boot’ (the model parameters are set to the estimated values obtained when fitting a bootstrapped version of the data set).
history_callback
can be implemented to have side effects such as saving the current imputed data set to disk.
get_fitting_data (vname) | Return the data needed to fit a model for imputation. |
get_split_data (vname) | Return endog and exog for imputation of a given variable. |
impute (vname) | |
impute_pmm (vname) | Use predictive mean matching to impute missing values. |
next_sample () | Returns the next imputed dataset in the imputation process. |
perturb_params (vname) | |
plot_bivariate (col1_name, col2_name[, …]) | Plot observed and imputed values for two variables. |
plot_fit_obs (col_name[, lowess_args, …]) | Plot fitted versus imputed or observed values as a scatterplot. |
plot_imputed_hist (col_name[, ax, …]) | Display imputed values for one variable as a histogram. |
plot_missing_pattern ([ax, row_order, …]) | Generate an image showing the missing data pattern. |
set_imputer (endog_name[, formula, …]) | Specify the imputation process for a single variable. |
update (vname) | Impute missing values for a single variable. |
update_all ([n_iter]) | Perform a specified number of MICE iterations. |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.imputation.mice.MICEData.html