Generalized linear models currently supports estimation using the one-parameter exponential families.
See Module Reference for commands and arguments.
# Load modules and data
In [1]: import statsmodels.api as sm
In [2]: data = sm.datasets.scotland.load()
In [3]: data.exog = sm.add_constant(data.exog)
# Instantiate a gamma family model with the default link function.
In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma())
In [5]: gamma_results = gamma_model.fit()
In [6]: print(gamma_results.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 32
Model: GLM Df Residuals: 24
Model Family: Gamma Df Model: 7
Link Function: inverse_power Scale: 0.0035843
Method: IRLS Log-Likelihood: -83.017
Date: Mon, 14 May 2018 Deviance: 0.087389
Time: 21:48:07 Pearson chi2: 0.0860
No. Iterations: 6 Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0178 0.011 -1.548 0.122 -0.040 0.005
x1 4.962e-05 1.62e-05 3.060 0.002 1.78e-05 8.14e-05
x2 0.0020 0.001 3.824 0.000 0.001 0.003
x3 -7.181e-05 2.71e-05 -2.648 0.008 -0.000 -1.87e-05
x4 0.0001 4.06e-05 2.757 0.006 3.23e-05 0.000
x5 -1.468e-07 1.24e-07 -1.187 0.235 -3.89e-07 9.56e-08
x6 -0.0005 0.000 -2.159 0.031 -0.001 -4.78e-05
x7 -2.427e-06 7.46e-07 -3.253 0.001 -3.89e-06 -9.65e-07
==============================================================================
Detailed examples can be found here:
The statistical model for each observation \(i\) is assumed to be
\(Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i)\) and \(\mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta)\).where \(g\) is the link function and \(F_{EDM}(\cdot|\theta,\phi,w)\) is a distribution of the family of exponential dispersion models (EDM) with natural parameter \(\theta\), scale parameter \(\phi\) and weight \(w\). Its density is given by
\(f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,.\)It follows that \(\mu = b'(\theta)\) and \(Var[Y|x]=\frac{\phi}{w}b''(\theta)\). The inverse of the first equation gives the natural parameter as a function of the expected value \(\theta(\mu)\) such that
\(Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)\)with \(v(\mu) = b''(\theta(\mu))\). Therefore it is said that a GLM is determined by link function \(g\) and variance function \(v(\mu)\) alone (and \(x\) of course).
Note that while \(\phi\) is the same for every observation \(y_i\) and therefore does not influence the estimation of \(\beta\), the weights \(w_i\) might be different for every \(y_i\) such that the estimation of \(\beta\) depends on them.
| Distribution | Domain | \(\mu=E[Y|x]\) | \(v(\mu)\) | \(\theta(\mu)\) | \(b(\theta)\) | \(\phi\) |
|---|---|---|---|---|---|---|
| Binomial \(B(n,p)\) | \(0,1,\ldots,n\) | \(np\) | \(\mu-\frac{\mu^2}{n}\) | \(\log\frac{p}{1-p}\) | \(n\log(1+e^\theta)\) | 1 |
| Poisson \(P(\mu)\) | \(0,1,\ldots,\infty\) | \(\mu\) | \(\mu\) | \(\log(\mu)\) | \(e^\theta\) | 1 |
| Neg. Binom. \(NB(\mu,\alpha)\) | \(0,1,\ldots,\infty\) | \(\mu\) | \(\mu+\alpha\mu^2\) | \(\log(\frac{\alpha\mu}{1+\alpha\mu})\) | \(-\frac{1}{\alpha}\log(1-\alpha e^\theta)\) | 1 |
| Gaussian/Normal \(N(\mu,\sigma^2)\) | \((-\infty,\infty)\) | \(\mu\) | \(1\) | \(\mu\) | \(\frac{1}{2}\theta^2\) | \(\sigma^2\) |
| Gamma \(N(\mu,\nu)\) | \((0,\infty)\) | \(\mu\) | \(\mu^2\) | \(-\frac{1}{\mu}\) | \(-\log(-\theta)\) | \(\frac{1}{\nu}\) |
| Inv. Gauss. \(IG(\mu,\sigma^2)\) | \((0,\infty)\) | \(\mu\) | \(\mu^3\) | \(-\frac{1}{2\mu^2}\) | \(-\sqrt{-2\theta}\) | \(\sigma^2\) |
| Tweedie \(p\geq 1\) | depends on \(p\) | \(\mu\) | \(\mu^p\) | \(\frac{\mu^{1-p}}{1-p}\) | \(\frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha}\) | \(\phi\) |
The Tweedie distribution has special cases for \(p=0,1,2\) not listed in the table and uses \(\alpha=\frac{p-2}{p-1}\).
Correspondence of mathematical variables to code:
endog, the variable one wants to modelexog, the covariates alias explanatory variablesparams, the parameters one wants to estimatemu, the expectation (conditional on \(x\)) of \(Y\)
link argument to the class Family
scale, the dispersion parameter of the EDMvar_weights
var_power for the power of the variance function \(v(\mu)\) of the Tweedie distribution, see tablealpha, see table
GLM(endog, exog[, family, offset, exposure, …]) | Generalized Linear Models class |
GLMResults(model, params, …[, cov_type, …]) | Class to contain GLM results. |
PredictionResults(predicted_mean, var_pred_mean) |
The distribution families currently implemented are
Family(link, variance) | The parent class for one-parameter exponential families. |
Binomial([link]) | Binomial exponential family distribution. |
Gamma([link]) | Gamma exponential family distribution. |
Gaussian([link]) | Gaussian exponential family distribution. |
InverseGaussian([link]) | InverseGaussian exponential family. |
NegativeBinomial([link, alpha]) | Negative Binomial exponential family. |
Poisson([link]) | Poisson exponential family. |
Tweedie([link, var_power]) | Tweedie family. |
The link functions currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by
>>> sm.families.family.<familyname>.links
Link | A generic link function for one-parameter exponential family. |
CDFLink([dbn]) | The use the CDF of a scipy.stats distribution |
CLogLog | The complementary log-log transform |
Log | The log transform |
Logit | The logit transform |
NegativeBinomial([alpha]) | The negative binomial link function |
Power([power]) | The power transform |
cauchy() | The Cauchy (standard Cauchy CDF) transform |
cloglog | The CLogLog transform link function. |
identity() | The identity transform |
inverse_power() | The inverse transform |
inverse_squared() | The inverse squared transform |
log | The log transform |
logit | |
nbinom([alpha]) | The negative binomial link function. |
probit([dbn]) | The probit (standard normal CDF) transform |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/glm.html