W3cubDocs

f_regression

sklearn.feature_selection.f_regression(X, y, *, center=True, force_finite=True)[source]

Univariate linear regression tests returning F-statistic and p-values.

Quick linear model for testing the effect of a single regressor, sequentially for many regressors.

This is done in 2 steps:

The cross correlation between each regressor and the target is computed using r_regression as:
```
E[(X[:, i] - mean(X[:, i])) * (y - mean(y))] / (std(X[:, i]) * std(y))
```
It is converted to an F score and then to a p-value.

f_regression is derived from r_regression and will rank features in the same order if all the features are positively correlated with the target.

Note however that contrary to f_regression, r_regression values lie in [-1, 1] and can thus be negative. f_regression is therefore recommended as a feature selection criterion to identify potentially predictive feature for a downstream classifier, irrespective of the sign of the association with the target variable.

Furthermore f_regression returns p-values while r_regression does not.

Examples

>>> from sklearn.datasets import make_regression
>>> from sklearn.feature_selection import f_regression
>>> X, y = make_regression(
...     n_samples=50, n_features=3, n_informative=1, noise=1e-4, random_state=42
... )
>>> f_statistic, p_values = f_regression(X, y)
>>> f_statistic
array([1.2...+00, 2.6...+13, 2.6...+00])
>>> p_values
array([2.7..., 1.5..., 1.0...])

Gallery examples

Feature agglomeration vs. univariate selection

Comparison of F-test and mutual information

© 2007–2025 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/1.6/modules/generated/sklearn.feature_selection.f_regression.html