This notebook illustrates how you can use R-style formulas to fit Generalized Linear Models.
 To begin, we load the Star98 dataset and we construct a formula and pre-process the data:
          Then, we fit the GLM model:
         Out[2]:
   Generalized Linear Model Regression Results  | Dep. Variable: |  SUCCESS |   No. Observations:  |   303 |  
  | Model: |  GLM |   Df Residuals:  |   282 |  
  | Model Family: |  Binomial |   Df Model:  |   20 |  
  | Link Function: |  logit |   Scale:  |   1.0000 |  
  | Method: |  IRLS |   Log-Likelihood:  |   -127.33 |  
  | Date: |  Mon, 14 May 2018 |   Deviance:  |   8.5477 |  
  | Time: |  21:44:55 |   Pearson chi2:  |   8.48 |  
  | No. Iterations: |  4 |   Covariance Type:  |  nonrobust |  
 
    |  coef |  std err |  z |  P>|z| |  [0.025 |  0.975] |  
  | Intercept |   0.4037 |   25.036 |   0.016 |   0.987 |   -48.665 |   49.472 |  
  | LOWINC |   -0.0204 |   0.010 |   -1.982 |   0.048 |   -0.041 |   -0.000 |  
  | PERASIAN |   0.0159 |   0.017 |   0.910 |   0.363 |   -0.018 |   0.050 |  
  | PERBLACK |   -0.0198 |   0.020 |   -1.004 |   0.316 |   -0.058 |   0.019 |  
  | PERHISP |   -0.0096 |   0.010 |   -0.951 |   0.341 |   -0.029 |   0.010 |  
  | PCTCHRT |   -0.0022 |   0.022 |   -0.103 |   0.918 |   -0.045 |   0.040 |  
  | PCTYRRND |   -0.0022 |   0.006 |   -0.348 |   0.728 |   -0.014 |   0.010 |  
  | PERMINTE |   0.1068 |   0.787 |   0.136 |   0.892 |   -1.436 |   1.650 |  
  | AVYRSEXP |   -0.0411 |   1.176 |   -0.035 |   0.972 |   -2.346 |   2.264 |  
  | PERMINTE:AVYRSEXP |   -0.0031 |   0.054 |   -0.057 |   0.954 |   -0.108 |   0.102 |  
  | AVSALK |   0.0131 |   0.295 |   0.044 |   0.965 |   -0.566 |   0.592 |  
  | PERMINTE:AVSALK |   -0.0019 |   0.013 |   -0.145 |   0.885 |   -0.028 |   0.024 |  
  | AVYRSEXP:AVSALK |   0.0008 |   0.020 |   0.038 |   0.970 |   -0.039 |   0.041 |  
  | PERMINTE:AVYRSEXP:AVSALK |   5.978e-05 |   0.001 |   0.068 |   0.946 |   -0.002 |   0.002 |  
  | PERSPENK |   -0.3097 |   4.233 |   -0.073 |   0.942 |   -8.606 |   7.987 |  
  | PTRATIO |   0.0096 |   0.919 |   0.010 |   0.992 |   -1.792 |   1.811 |  
  | PERSPENK:PTRATIO |   0.0066 |   0.206 |   0.032 |   0.974 |   -0.397 |   0.410 |  
  | PCTAF |   -0.0143 |   0.474 |   -0.030 |   0.976 |   -0.944 |   0.916 |  
  | PERSPENK:PCTAF |   0.0105 |   0.098 |   0.107 |   0.915 |   -0.182 |   0.203 |  
  | PTRATIO:PCTAF |   -0.0001 |   0.022 |   -0.005 |   0.996 |   -0.044 |   0.044 |  
  | PERSPENK:PTRATIO:PCTAF |   -0.0002 |   0.005 |   -0.051 |   0.959 |   -0.010 |   0.009 |  
 
             Finally, we define a function to operate customized data transformation using the formula framework:
         Out[3]:
   Generalized Linear Model Regression Results  | Dep. Variable: |  SUCCESS |   No. Observations:  |   303 |  
  | Model: |  GLM |   Df Residuals:  |   282 |  
  | Model Family: |  Binomial |   Df Model:  |   20 |  
  | Link Function: |  logit |   Scale:  |   1.0000 |  
  | Method: |  IRLS |   Log-Likelihood:  |   -127.33 |  
  | Date: |  Mon, 14 May 2018 |   Deviance:  |   8.5477 |  
  | Time: |  21:44:55 |   Pearson chi2:  |   8.48 |  
  | No. Iterations: |  4 |   Covariance Type:  |  nonrobust |  
 
    |  coef |  std err |  z |  P>|z| |  [0.025 |  0.975] |  
  | Intercept |   0.4037 |   25.036 |   0.016 |   0.987 |   -48.665 |   49.472 |  
  | double_it(LOWINC) |   -0.0102 |   0.005 |   -1.982 |   0.048 |   -0.020 |   -0.000 |  
  | PERASIAN |   0.0159 |   0.017 |   0.910 |   0.363 |   -0.018 |   0.050 |  
  | PERBLACK |   -0.0198 |   0.020 |   -1.004 |   0.316 |   -0.058 |   0.019 |  
  | PERHISP |   -0.0096 |   0.010 |   -0.951 |   0.341 |   -0.029 |   0.010 |  
  | PCTCHRT |   -0.0022 |   0.022 |   -0.103 |   0.918 |   -0.045 |   0.040 |  
  | PCTYRRND |   -0.0022 |   0.006 |   -0.348 |   0.728 |   -0.014 |   0.010 |  
  | PERMINTE |   0.1068 |   0.787 |   0.136 |   0.892 |   -1.436 |   1.650 |  
  | AVYRSEXP |   -0.0411 |   1.176 |   -0.035 |   0.972 |   -2.346 |   2.264 |  
  | PERMINTE:AVYRSEXP |   -0.0031 |   0.054 |   -0.057 |   0.954 |   -0.108 |   0.102 |  
  | AVSALK |   0.0131 |   0.295 |   0.044 |   0.965 |   -0.566 |   0.592 |  
  | PERMINTE:AVSALK |   -0.0019 |   0.013 |   -0.145 |   0.885 |   -0.028 |   0.024 |  
  | AVYRSEXP:AVSALK |   0.0008 |   0.020 |   0.038 |   0.970 |   -0.039 |   0.041 |  
  | PERMINTE:AVYRSEXP:AVSALK |   5.978e-05 |   0.001 |   0.068 |   0.946 |   -0.002 |   0.002 |  
  | PERSPENK |   -0.3097 |   4.233 |   -0.073 |   0.942 |   -8.606 |   7.987 |  
  | PTRATIO |   0.0096 |   0.919 |   0.010 |   0.992 |   -1.792 |   1.811 |  
  | PERSPENK:PTRATIO |   0.0066 |   0.206 |   0.032 |   0.974 |   -0.397 |   0.410 |  
  | PCTAF |   -0.0143 |   0.474 |   -0.030 |   0.976 |   -0.944 |   0.916 |  
  | PERSPENK:PCTAF |   0.0105 |   0.098 |   0.107 |   0.915 |   -0.182 |   0.203 |  
  | PTRATIO:PCTAF |   -0.0001 |   0.022 |   -0.005 |   0.996 |   -0.044 |   0.044 |  
  | PERSPENK:PTRATIO:PCTAF |   -0.0002 |   0.005 |   -0.051 |   0.959 |   -0.010 |   0.009 |  
 
             As expected, the coefficient for double_it(LOWINC) in the second model is half the size of the LOWINC coefficient from the first model:
           -0.020395987154757125
-0.020395987154757402