statsmodels.stats.correlation_tools.corr_nearest_factor(corr, rank, ctol=1e-06, lam_min=1e-30, lam_max=1e+30, maxiter=1000)
[source]
Find the nearest correlation matrix with factor structure to a given square matrix.
Parameters: |
|
---|---|
Returns: |
rslt – rslt.corr is a FactoredPSDMatrix defining the estimated correlation structure. Other fields of |
Return type: |
Bunch |
A correlation matrix has factor structure if it can be written in the form I + XX’ - diag(XX’), where X is n x k with linearly independent columns, and with each row having sum of squares at most equal to 1. The approximation is made in terms of the Frobenius norm.
This routine is useful when one has an approximate correlation matrix that is not positive semidefinite, and there is need to estimate the inverse, square root, or inverse square root of the population correlation matrix. The factor structure allows these tasks to be done without constructing any n x n matrices.
This is a non-convex problem with no known gauranteed globally convergent algorithm for computing the solution. Borsdof, Higham and Raydan (2010) compared several methods for this problem and found the spectral projected gradient (SPG) method (used here) to perform best.
The input matrix corr
can be a dense numpy array or any scipy sparse matrix. The latter is useful if the input matrix is obtained by thresholding a very large sample correlation matrix. If corr
is sparse, the calculations are optimized to save memory, so no working matrix with more than 10^6 elements is constructed.
R Borsdof, N Higham, M Raydan (2010). Computing a nearest correlation matrix with factor structure. SIAM J Matrix Anal Appl, 31:5, 2603-2622. http://eprints.ma.man.ac.uk/1523/01/covered/MIMS_ep2009_87.pdf
Hard thresholding a correlation matrix may result in a matrix that is not positive semidefinite. We can approximate a hard thresholded correlation matrix with a PSD matrix as follows, where corr
is the input correlation matrix.
>>> import numpy as np >>> from statsmodels.stats.correlation_tools import corr_nearest_factor >>> np.random.seed(1234) >>> b = 1.5 - np.random.rand(10, 1) >>> x = np.random.randn(100,1).dot(b.T) + np.random.randn(100,10) >>> corr = np.corrcoef(x.T) >>> corr = corr * (np.abs(corr) >= 0.3) >>> rslt = corr_nearest_factor(corr, 3)
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.stats.correlation_tools.corr_nearest_factor.html