class statsmodels.multivariate.pca.PCA(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd', missing=None, tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100)
[source]
Principal Component Analysis
Parameters: |
|
---|
factors
array or DataFrame – nobs by ncomp array of of principal components (scores)
scores
array or DataFrame – nobs by ncomp array of of principal components - identical to factors
loadings
array or DataFrame – ncomp by nvar array of principal component loadings for constructing the factors
coeff
array or DataFrame – nvar by ncomp array of principal component loadings for constructing the projections
projection
array or DataFrame – nobs by var array containing the projection of the data onto the ncomp estimated factors
rsquare
array or Series – ncomp array where the element in the ith position is the R-square of including the fist i principal components. Note: values are calculated on the transformed data, not the original data
ic
array or DataFrame – ncomp by 3 array containing the Bai and Ng (2003) Information criteria. Each column is a different criteria, and each row represents the number of included factors.
eigenvals
array or Series – nvar array of eigenvalues
eigenvecs
array or DataFrame – nvar by nvar array of eigenvectors
weights
array – nvar array of weights used to compute the principal components, normalized to unit length
transformed_data
array – Standardized, demeaned and weighted data used to compute principal components and related quantities
cols
array – Array of indices indicating columns used in the PCA
rows
array – Array of indices indicating rows used in the PCA
Basic PCA using the correlation matrix of the data
>>> import numpy as np >>> from statsmodels.multivariate.pca import PCA >>> x = np.random.randn(100)[:, None] >>> x = x + np.random.randn(100, 100) >>> pc = PCA(x)
Note that the principal components are computed using a SVD and so the correlation matrix is never constructed, unless method=’eig’.
PCA using the covariance matrix of the data
>>> pc = PCA(x, standardize=False)
Limiting the number of factors returned to 1 computed using NIPALS
>>> pc = PCA(x, ncomp=1, method='nipals') >>> pc.factors.shape (100, 1)
The default options perform principal component analysis on the demeanded, unit variance version of data. Setting standardize to False will instead onle demean, and setting both standardized and demean to False will not alter the data.
Once the data have been transformed, the following relationships hold when the number of components (ncomp) is the same as tne minimum of the number of observation or the number of variables.
where X is the data
, F is the array of principal components (factors
or scores
), and V is the array of eigenvectors (loadings
) and V’ is the array of factor coefficients (coeff
).
When weights are provided, the principal components are computed from the modified data
where \(\Omega\) is a diagonal matrix composed of the weights. For example, when using the GLS version of PCA, the elements of \(\Omega\) will be the inverse of the variances of the residuals from
where the number of factors is less than the rank of X
[*] | J. Bai and S. Ng, “Determining the number of factors in approximate factor models,” Econometrica, vol. 70, number 1, pp. 191-221, 2002 |
plot_rsquare ([ncomp, ax]) | Box plots of the individual series R-square against the number of PCs |
plot_scree ([ncomp, log_scale, cumulative, ax]) | Plot of the ordered eigenvalues |
project ([ncomp, transform, unweight]) | Project series onto a specific number of factors |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.multivariate.pca.PCA.html