class statsmodels.genmod.generalized_estimating_equations.GEE(endog, exog, groups, time=None, family=None, cov_struct=None, missing='none', offset=None, exposure=None, dep_data=None, constraint=None, update_dep=True, weights=None, **kwargs)
[source]
Estimation of marginal regression models using Generalized Estimating Equations (GEE).
Marginal regression model fit using Generalized Estimating Equations.
GEE can be used to fit Generalized Linear Models (GLMs) when the data have a grouped structure, and the observations are possibly correlated within groups but not between groups.
Parameters: |
|
---|
Only the following combinations make sense for family and link
+ ident log logit probit cloglog pow opow nbinom loglog logc Gaussian | x x x inv Gaussian | x x x binomial | x x x x x x x x x Poission | x x x neg binomial | x x x x gamma | x x x
Not all of these link functions are currently available.
Endog and exog are references so that if the data they refer to are already arrays and these arrays are changed, endog and exog will change.
The “robust” covariance type is the standard “sandwich estimator” (e.g. Liang and Zeger (1986)). It is the default here and in most other packages. The “naive” estimator gives smaller standard errors, but is only correct if the working correlation structure is correctly specified. The “bias reduced” estimator of Mancl and DeRouen (Biometrics, 2001) reduces the downard bias of the robust estimator.
The robust covariance provided here follows Liang and Zeger (1986) and agrees with R’s gee implementation. To obtain the robust standard errors reported in Stata, multiply by sqrt(N / (N - g)), where N is the total sample size, and g is the average group size.
Logistic regression with autoregressive working dependence:
>>> import statsmodels.api as sm >>> family = sm.families.Binomial() >>> va = sm.cov_struct.Autoregressive() >>> model = sm.GEE(endog, exog, group, family=family, cov_struct=va) >>> result = model.fit() >>> print(result.summary())
Use formulas to fit a Poisson GLM with independent working dependence:
>>> import statsmodels.api as sm >>> fam = sm.families.Poisson() >>> ind = sm.cov_struct.Independence() >>> model = sm.GEE.from_formula("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam) >>> result = model.fit() >>> print(result.summary())
Equivalent, using the formula API:
>>> import statsmodels.api as sm >>> import statsmodels.formula.api as smf >>> fam = sm.families.Poisson() >>> ind = sm.cov_struct.Independence() >>> model = smf.gee("y ~ age + trt + base", "subject", data, cov_struct=ind, family=fam) >>> result = model.fit() >>> print(result.summary())
cluster_list (array) | Returns array split into subarrays corresponding to the cluster structure. |
estimate_scale () | Returns an estimate of the scale parameter at the current parameter value. |
fit ([maxiter, ctol, start_params, …]) | Fits a marginal regression model using generalized estimating equations (GEE). |
from_formula (formula, groups, data[, …]) | Create a Model from a formula and dataframe. |
mean_deriv (exog, lin_pred) | Derivative of the expected endog with respect to the parameters. |
mean_deriv_exog (exog, params[, offset_exposure]) | Derivative of the expected endog with respect to exog. |
predict (params[, exog, offset, exposure, linear]) | Return predicted values for a marginal regression model fit using GEE. |
update_cached_means (mean_params) | cached_means should always contain the most recent calculation of the group-wise mean vectors. |
cached_means | |
endog_names | Names of endogenous variables |
exog_names | Names of exogenous variables |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_estimating_equations.GEE.html