sklearn.feature_selection.chi2(X, y)
[source]
Compute chi-squared stats between each non-negative feature and class.
This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes.
Recall that the chi-square test measures dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification.
Read more in the User Guide.
Parameters: |
|
---|---|
Returns: |
|
See also
f_classif
f_regression
Complexity of this algorithm is O(n_classes * n_features).
sklearn.feature_selection.chi2
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html