This notebook illustrates how you can use R-style formulas to fit Generalized Linear Models.
To begin, we load the Star98
dataset and we construct a formula and pre-process the data:
Then, we fit the GLM model:
Out[2]:
Generalized Linear Model Regression Results Dep. Variable: | SUCCESS | No. Observations: | 303 |
Model: | GLM | Df Residuals: | 282 |
Model Family: | Binomial | Df Model: | 20 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -127.33 |
Date: | Mon, 14 May 2018 | Deviance: | 8.5477 |
Time: | 21:44:55 | Pearson chi2: | 8.48 |
No. Iterations: | 4 | Covariance Type: | nonrobust |
| coef | std err | z | P>|z| | [0.025 | 0.975] |
Intercept | 0.4037 | 25.036 | 0.016 | 0.987 | -48.665 | 49.472 |
LOWINC | -0.0204 | 0.010 | -1.982 | 0.048 | -0.041 | -0.000 |
PERASIAN | 0.0159 | 0.017 | 0.910 | 0.363 | -0.018 | 0.050 |
PERBLACK | -0.0198 | 0.020 | -1.004 | 0.316 | -0.058 | 0.019 |
PERHISP | -0.0096 | 0.010 | -0.951 | 0.341 | -0.029 | 0.010 |
PCTCHRT | -0.0022 | 0.022 | -0.103 | 0.918 | -0.045 | 0.040 |
PCTYRRND | -0.0022 | 0.006 | -0.348 | 0.728 | -0.014 | 0.010 |
PERMINTE | 0.1068 | 0.787 | 0.136 | 0.892 | -1.436 | 1.650 |
AVYRSEXP | -0.0411 | 1.176 | -0.035 | 0.972 | -2.346 | 2.264 |
PERMINTE:AVYRSEXP | -0.0031 | 0.054 | -0.057 | 0.954 | -0.108 | 0.102 |
AVSALK | 0.0131 | 0.295 | 0.044 | 0.965 | -0.566 | 0.592 |
PERMINTE:AVSALK | -0.0019 | 0.013 | -0.145 | 0.885 | -0.028 | 0.024 |
AVYRSEXP:AVSALK | 0.0008 | 0.020 | 0.038 | 0.970 | -0.039 | 0.041 |
PERMINTE:AVYRSEXP:AVSALK | 5.978e-05 | 0.001 | 0.068 | 0.946 | -0.002 | 0.002 |
PERSPENK | -0.3097 | 4.233 | -0.073 | 0.942 | -8.606 | 7.987 |
PTRATIO | 0.0096 | 0.919 | 0.010 | 0.992 | -1.792 | 1.811 |
PERSPENK:PTRATIO | 0.0066 | 0.206 | 0.032 | 0.974 | -0.397 | 0.410 |
PCTAF | -0.0143 | 0.474 | -0.030 | 0.976 | -0.944 | 0.916 |
PERSPENK:PCTAF | 0.0105 | 0.098 | 0.107 | 0.915 | -0.182 | 0.203 |
PTRATIO:PCTAF | -0.0001 | 0.022 | -0.005 | 0.996 | -0.044 | 0.044 |
PERSPENK:PTRATIO:PCTAF | -0.0002 | 0.005 | -0.051 | 0.959 | -0.010 | 0.009 |
Finally, we define a function to operate customized data transformation using the formula framework:
Out[3]:
Generalized Linear Model Regression Results Dep. Variable: | SUCCESS | No. Observations: | 303 |
Model: | GLM | Df Residuals: | 282 |
Model Family: | Binomial | Df Model: | 20 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -127.33 |
Date: | Mon, 14 May 2018 | Deviance: | 8.5477 |
Time: | 21:44:55 | Pearson chi2: | 8.48 |
No. Iterations: | 4 | Covariance Type: | nonrobust |
| coef | std err | z | P>|z| | [0.025 | 0.975] |
Intercept | 0.4037 | 25.036 | 0.016 | 0.987 | -48.665 | 49.472 |
double_it(LOWINC) | -0.0102 | 0.005 | -1.982 | 0.048 | -0.020 | -0.000 |
PERASIAN | 0.0159 | 0.017 | 0.910 | 0.363 | -0.018 | 0.050 |
PERBLACK | -0.0198 | 0.020 | -1.004 | 0.316 | -0.058 | 0.019 |
PERHISP | -0.0096 | 0.010 | -0.951 | 0.341 | -0.029 | 0.010 |
PCTCHRT | -0.0022 | 0.022 | -0.103 | 0.918 | -0.045 | 0.040 |
PCTYRRND | -0.0022 | 0.006 | -0.348 | 0.728 | -0.014 | 0.010 |
PERMINTE | 0.1068 | 0.787 | 0.136 | 0.892 | -1.436 | 1.650 |
AVYRSEXP | -0.0411 | 1.176 | -0.035 | 0.972 | -2.346 | 2.264 |
PERMINTE:AVYRSEXP | -0.0031 | 0.054 | -0.057 | 0.954 | -0.108 | 0.102 |
AVSALK | 0.0131 | 0.295 | 0.044 | 0.965 | -0.566 | 0.592 |
PERMINTE:AVSALK | -0.0019 | 0.013 | -0.145 | 0.885 | -0.028 | 0.024 |
AVYRSEXP:AVSALK | 0.0008 | 0.020 | 0.038 | 0.970 | -0.039 | 0.041 |
PERMINTE:AVYRSEXP:AVSALK | 5.978e-05 | 0.001 | 0.068 | 0.946 | -0.002 | 0.002 |
PERSPENK | -0.3097 | 4.233 | -0.073 | 0.942 | -8.606 | 7.987 |
PTRATIO | 0.0096 | 0.919 | 0.010 | 0.992 | -1.792 | 1.811 |
PERSPENK:PTRATIO | 0.0066 | 0.206 | 0.032 | 0.974 | -0.397 | 0.410 |
PCTAF | -0.0143 | 0.474 | -0.030 | 0.976 | -0.944 | 0.916 |
PERSPENK:PCTAF | 0.0105 | 0.098 | 0.107 | 0.915 | -0.182 | 0.203 |
PTRATIO:PCTAF | -0.0001 | 0.022 | -0.005 | 0.996 | -0.044 | 0.044 |
PERSPENK:PTRATIO:PCTAF | -0.0002 | 0.005 | -0.051 | 0.959 | -0.010 | 0.009 |
As expected, the coefficient for double_it(LOWINC)
in the second model is half the size of the LOWINC
coefficient from the first model:
-0.020395987154757125
-0.020395987154757402