class statsmodels.genmod.generalized_linear_model.GLM(endog, exog, family=None, offset=None, exposure=None, freq_weights=None, var_weights=None, missing='none', **kwargs)
[source]
Generalized Linear Models class
GLM inherits from statsmodels.base.model.LikelihoodModel
Parameters: |
|
---|
df_model
float – p
- 1, where p
is the number of regressors including the intercept.
df_resid
float – The number of observation n
minus the number of regressors p
.
endog
array – See Parameters.
exog
array – See Parameters.
family
family class instance – A pointer to the distribution family of the model.
freq_weights
array – See Parameters.
var_weights
array – See Parameters.
mu
array – The estimated mean response of the transformed variable.
n_trials
array – See Parameters.
normalized_cov_params
array – p
x p
normalized covariance of the design / exogenous data.
scale
float – The estimate of the scale / dispersion. Available after fit is called.
scaletype
str – The scaling used for fitting the model. Available after fit is called.
weights
array – The value of the weights after the last iteration of fit.
>>> import statsmodels.api as sm >>> data = sm.datasets.scotland.load() >>> data.exog = sm.add_constant(data.exog)
Instantiate a gamma family model with the default link function.
>>> gamma_model = sm.GLM(data.endog, data.exog, ... family=sm.families.Gamma())
>>> gamma_results = gamma_model.fit() >>> gamma_results.params array([-0.01776527, 0.00004962, 0.00203442, -0.00007181, 0.00011185, -0.00000015, -0.00051868, -0.00000243]) >>> gamma_results.scale 0.0035842831734919055 >>> gamma_results.deviance 0.087388516416999198 >>> gamma_results.pearson_chi2 0.086022796163805704 >>> gamma_results.llf -83.017202161073527
Only the following combinations make sense for family and link:
Family | ident | log | logit | probit | cloglog | pow | opow | nbinom | loglog | logc |
---|---|---|---|---|---|---|---|---|---|---|
Gaussian | x | x | x | x | x | x | x | x | x | |
inv Gaussian | x | x | x | |||||||
binomial | x | x | x | x | x | x | x | x | x | |
Poission | x | x | x | |||||||
neg binomial | x | x | x | x | ||||||
gamma | x | x | x | |||||||
Tweedie | x | x | x |
Not all of these link functions are currently available.
Endog and exog are references so that if the data they refer to are already arrays and these arrays are changed, endog and exog will change.
Statsmodels supports two separte definitions of weights: frequency weights and variance weights.
Frequency weights produce the same results as repeating observations by the frequencies (if those are integers). Frequency weights will keep the number of observations consistent, but the degrees of freedom will change to reflect the new weights.
Variance weights (referred to in other packages as analytic weights) are used when endog
represents an an average or mean. This relies on the assumption that that the inverse variance scales proportionally to the weight–an observation that is deemed more credible should have less variance and therefore have more weight. For the Poisson
family–which assumes that occurences scale proportionally with time–a natural practice would be to use the amount of time as the variance weight and set endog
to be a rate (occurrances per period of time). Similarly, using a compound Poisson family, namely Tweedie
, makes a similar assumption about the rate (or frequency) of occurences having variance proportional to time.
Both frequency and variance weights are verified for all basic results with nonrobust or heteroscedasticity robust cov_type
. Other robust covariance types have not yet been verified, and at least the small sample correction is currently not based on the correct total frequency count.
Currently, all residuals are not weighted by frequency, although they may incorporate n_trials
for Binomial
and var_weights
Residual Type | Applicable weights |
---|---|
Anscombe | var_weights |
Deviance | var_weights |
Pearson |
var_weights and n_trials
|
Reponse | n_trials |
Working | n_trials |
WARNING: Loglikelihood and deviance are not valid in models where scale is equal to 1 (i.e., Binomial
, NegativeBinomial
, and Poisson
). If variance weights are specified, then results such as loglike
and deviance
are based on a quasi-likelihood interpretation. The loglikelihood is not correctly specified in this case, and statistics based on it, such AIC or likelihood ratio tests, are not appropriate.
df_model
float – Model degrees of freedom is equal to p - 1, where p is the number of regressors. Note that the intercept is not reported as a degree of freedom.
df_resid
float – Residual degrees of freedom is equal to the number of observation n minus the number of regressors p.
endog
array – See above. Note that endog
is a reference to the data so that if data is already an array and it is changed, then endog
changes as well.
exposure
array-like – Include ln(exposure) in model with coefficient constrained to 1. Can only be used if the link is the logarithm function.
exog
array – See above. Note that exog
is a reference to the data so that if data is already an array and it is changed, then exog
changes as well.
freq_weights
array – See above. Note that freq_weights
is a reference to the data so that if data is already an array and it is changed, then freq_weights
changes as well.
var_weights
array – See above. Note that var_weights
is a reference to the data so that if data is already an array and it is changed, then var_weights
changes as well.
iteration
int – The number of iterations that fit has run. Initialized at 0.
family
family class instance – The distribution family of the model. Can be any family in statsmodels.families. Default is Gaussian.
mu
array – The mean response of the transformed variable. mu
is the value of the inverse of the link function at lin_pred, where lin_pred is the linear predicted value of the WLS fit of the transformed variable. mu
is only available after fit is called. See statsmodels.families.family.fitted of the distribution family for more information.
n_trials
array – See above. Note that n_trials
is a reference to the data so that if data is already an array and it is changed, then n_trials
changes as well. n_trials
is the number of binomial trials and only available with that distribution. See statsmodels.families.Binomial for more information.
normalized_cov_params
array – The p x p normalized covariance of the design / exogenous data. This is approximately equal to (X.T X)^(-1)
offset
array-like – Include offset in model with coefficient constrained to 1.
scale
float – The estimate of the scale / dispersion of the model fit. Only available after fit is called. See GLM.fit and GLM.estimate_scale for more information.
scaletype
str – The scaling used for fitting the model. This is only available after fit is called. The default is None. See GLM.fit for more information.
weights
array – The value of the weights after the last iteration of fit. Only available after fit is called. See statsmodels.families.family for the specific distribution weighting functions.
estimate_scale (mu) | Estimates the dispersion/scale. |
estimate_tweedie_power (mu[, method, low, high]) | Tweedie specific function to estimate scale and the variance parameter. |
fit ([start_params, maxiter, method, tol, …]) | Fits a generalized linear model for a given family. |
fit_constrained (constraints[, start_params]) | fit the model subject to linear equality constraints |
fit_regularized ([method, alpha, …]) | Return a regularized fit to a linear regression model. |
from_formula (formula, data[, subset, drop_cols]) | Create a Model from a formula and dataframe. |
get_distribution (params[, scale, exog, …]) | Returns a random number generator for the predictive distribution. |
hessian (params[, scale, observed]) | Hessian, second derivative of loglikelihood function |
hessian_factor (params[, scale, observed]) | Weights for calculating Hessian |
information (params[, scale]) | Fisher information matrix. |
initialize () | Initialize a generalized linear model. |
loglike (params[, scale]) | Evaluate the log-likelihood for a generalized linear model. |
loglike_mu (mu[, scale]) | Evaluate the log-likelihood for a generalized linear model. |
predict (params[, exog, exposure, offset, linear]) | Return predicted values for a design matrix |
score (params[, scale]) | score, first derivative of the loglikelihood function |
score_factor (params[, scale]) | weights for score for each observation |
score_obs (params[, scale]) | score first derivative of the loglikelihood for each observation. |
score_test (params_constrained[, …]) | score test for restrictions or for omitted variables |
endog_names | Names of endogenous variables |
exog_names | Names of exogenous variables |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html