sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None)
Generate a random n-class classification problem.
This initially creates clusters of points normally distributed (std=1) about vertices of an
n_informative-dimensional hypercube with sides of length
2*class_sep and assigns an equal number of clusters to each class. It introduces interdependence between these features and adds various types of further noise to the data.
X horizontally stacks features in the following order: the primary
n_informative features, followed by
n_redundant linear combinations of the informative features, followed by
n_repeated duplicates, drawn randomly with replacement from the informative and redundant features. The remaining features are filled with random noise. Thus, without shuffling, all useful features are contained in the columns
X[:, :n_informative + n_redundant + n_repeated].
Read more in the User Guide.
The algorithm is adapted from Guyon  and was designed to generate the “Madelon” dataset.
|||I. Guyon, “Design of experiments for the NIPS 2003 variable selection benchmark”, 2003.|
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.