sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None)
[source]
Generate a random n-class classification problem.
This initially creates clusters of points normally distributed (std=1) about vertices of an n_informative
-dimensional hypercube with sides of length 2*class_sep
and assigns an equal number of clusters to each class. It introduces interdependence between these features and adds various types of further noise to the data.
Without shuffling, X
horizontally stacks features in the following order: the primary n_informative
features, followed by n_redundant
linear combinations of the informative features, followed by n_repeated
duplicates, drawn randomly with replacement from the informative and redundant features. The remaining features are filled with random noise. Thus, without shuffling, all useful features are contained in the columns X[:, :n_informative + n_redundant + n_repeated]
.
Read more in the User Guide.
Parameters: |
|
---|---|
Returns: |
|
See also
make_blobs
make_multilabel_classification
The algorithm is adapted from Guyon [1] and was designed to generate the “Madelon” dataset.
[1] | I. Guyon, “Design of experiments for the NIPS 2003 variable selection benchmark”, 2003. |
sklearn.datasets.make_classification
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html