Linear Mixed Effects models are used for regression analyses involving dependent data. Such data arise when working with longitudinal and other study designs in which multiple observations are made on each subject. Some specific linear mixed effects models are
The Statsmodels implementation of LME is primarily group-based, meaning that random effects must be independently-realized for responses in different groups. There are two types of random effects in our implementation of mixed models: (i) random coefficients (possibly vectors) that have an unknown covariance matrix, and (ii) random coefficients that are independent draws from a common univariate distribution. For both (i) and (ii), the random effects influence the conditional mean of a group through their matrix/vector product with a group-specific design matrix.
A simple example of random coefficients, as in (i) above, is:
Here, \(Y_{ij}\) is the \(j`th measured response for subject :math:`i\), and \(X_{ij}\) is a covariate for this response. The “fixed effects parameters” \(\beta_0\) and \(\beta_1\) are shared by all subjects, and the errors \(\epsilon_{ij}\) are independent of everything else, and identically distributed (with mean zero). The “random effects parameters” \(gamma_{0i}\) and \(gamma_{1i}\) follow a bivariate distribution with mean zero, described by three parameters: \({\rm var}\gamma_{0i}\), \({\rm var}\gamma_{1i}\), and \({\rm cov}(\gamma_{0i}, \gamma_{1i})\). There is also a parameter for \({\rm var}(\epsilon_{ij})\).
A simple example of variance components, as in (ii) above, is:
Here, \(Y_{ijk}\) is the \(k`th measured response under conditions :math:`i, j\). The only “mean structure parameter” is \(\beta_0\). The \(\eta_{1i}\) are independent and identically distributed with zero mean, and variance \(\tau_1^2\), and the \(\eta_{2j}\) are independent and identically distributed with zero mean, and variance \(\tau_2^2\).
Statsmodels MixedLM handles most non-crossed random effects models, and some crossed models. To include crossed random effects in a model, it is necessary to treat the entire dataset as a single group. The variance components arguments to the model can then be used to define models with various combinations of crossed and non-crossed random effects.
The Statsmodels LME framework currently supports post-estimation inference via Wald tests and confidence intervals on the coefficients, profile likelihood analysis, likelihood ratio testing, and AIC.
In [1]: import statsmodels.api as sm In [2]: import statsmodels.formula.api as smf In [3]: data = sm.datasets.get_rdataset("dietox", "geepack").data In [4]: md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"]) In [5]: mdf = md.fit() In [6]: print(mdf.summary()) Mixed Linear Model Regression Results ======================================================== Model: MixedLM Dependent Variable: Weight No. Observations: 861 Method: REML No. Groups: 72 Scale: 11.3669 Min. group size: 11 Likelihood: -2404.7753 Max. group size: 12 Converged: Yes Mean group size: 12.0 -------------------------------------------------------- Coef. Std.Err. z P>|z| [0.025 0.975] -------------------------------------------------------- Intercept 15.724 0.788 19.952 0.000 14.179 17.268 Time 6.943 0.033 207.939 0.000 6.877 7.008 Group Var 40.394 2.149 ========================================================
Detailed examples can be found here
There are some notebook examples on the Wiki: Wiki notebooks for MixedLM
The data are partitioned into disjoint groups. The probability model for group \(i\) is:
where
\(Y, X, \{Q_j\}\) and \(Z\) must be entirely observed. \(\beta\), \(\Psi\), and \(\sigma^2\) are estimated using ML or REML estimation, and \(\gamma\), \(\{\eta_j\}\) and \(\epsilon\) are random so define the probability model.
The marginal mean structure is \(E[Y|X,Z] = X*\beta\). If only the marginal mean structure is of interest, GEE is a good alternative to mixed models.
Notation:
The primary reference for the implementation details is:
See also this more recent document:
All the likelihood, gradient, and Hessian calculations closely follow Lindstrom and Bates.
The following two documents are written more from the perspective of users:
The model class is:
MixedLM (endog, exog, groups[, exog_re, …]) | An object specifying a linear mixed effects model. |
The result class is:
MixedLMResults (model, params, cov_params) | Class to contain results of fitting a linear mixed effects model. |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/mixed_linear.html