classmethod MixedLM.from_formula(formula, data, re_formula=None, vc_formula=None, subset=None, use_sparse=False, missing='none', *args, **kwargs)
[source]
Create a Model from a formula and dataframe.
Parameters: |
|
---|---|
Returns: |
model |
Return type: |
Model instance |
data
must define __getitem__ with the keys in the formula terms args and kwargs are passed on to the model instantiation. E.g., a numpy structured or rec array, a dictionary, or a pandas DataFrame.
If the variance component is intended to produce random intercepts for disjoint subsets of a group, specified by string labels or a categorical data value, always use ‘0 +’ in the formula so that no overall intercept is included.
If the variance components specify random slopes and you do not also want a random group-level intercept in the model, then use ‘0 +’ in the formula to exclude the intercept.
The variance components formulas are processed separately for each group. If a variable is categorical the results will not be affected by whether the group labels are distinct or re-used over the top-level groups.
Suppose we have data from an educational study with students nested in classrooms nested in schools. The students take a test, and we want to relate the test scores to the students’ ages, while accounting for the effects of classrooms and schools. The school will be the top-level group, and the classroom is a nested group that is specified as a variance component. Note that the schools may have different number of classrooms, and the classroom labels may (but need not be) different across the schools.
>>> vc = {'classroom': '0 + C(classroom)'} >>> MixedLM.from_formula('test_score ~ age', vc_formula=vc, re_formula='1', groups='school', data=data)
Now suppose we also have a previous test score called ‘pretest’. If we want the relationship between pretest scores and the current test to vary by classroom, we can specify a random slope for the pretest score
>>> vc = {'classroom': '0 + C(classroom)', 'pretest': '0 + pretest'} >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc, re_formula='1', groups='school', data=data)
The following model is almost equivalent to the previous one, but here the classroom random intercept and pretest slope may be correlated.
>>> vc = {'classroom': '0 + C(classroom)'} >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc, re_formula='1 + pretest', groups='school', data=data)
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.regression.mixed_linear_model.MixedLM.from_formula.html