W3cubDocs

sklearn.datasets.make_multilabel_classification

sklearn.datasets.make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2, length=50, allow_unlabeled=True, sparse=False, return_indicator=’dense’, return_distributions=False, random_state=None) [source]

Generate a random multilabel classification problem.

For each sample, the generative process is:

pick the number of labels: n ~ Poisson(n_labels)
n times, choose a class c: c ~ Multinomial(theta)
pick the document length: k ~ Poisson(length)
k times, choose a word: w ~ Multinomial(theta_c)

In the above process, rejection sampling is used to make sure that n is never zero or more than n_classes, and that the document length is never zero. Likewise, we reject classes which have already been chosen.

Read more in the User Guide.

Parameters:

Parameters:	`n_samples : int, optional (default=100)` The number of samples. `n_features : int, optional (default=20)` The total number of features. `n_classes : int, optional (default=5)` The number of classes of the classification problem. `n_labels : int, optional (default=2)` The average number of labels per instance. More precisely, the number of labels per sample is drawn from a Poisson distribution with `n_labels` as its expected value, but samples are bounded (using rejection sampling) by `n_classes`, and must be nonzero if `allow_unlabeled` is False. `length : int, optional (default=50)` The sum of the features (number of words if documents) is drawn from a Poisson distribution with this expected value. `allow_unlabeled : bool, optional (default=True)` If `True`, some instances might not belong to any class. `sparse : bool, optional (default=False)` If `True`, return a sparse feature matrix New in version 0.17: parameter to allow sparse output. `return_indicator : ‘dense’ (default) \| ‘sparse’ \| False` If `dense` return `Y` in the dense binary indicator format. If `'sparse'` return `Y` in the sparse binary indicator format. `False` returns a list of lists of labels. `return_distributions : bool, optional (default=False)` If `True`, return the prior class probability and conditional probabilities of features given classes, from which the data was drawn. `random_state : int, RandomState instance or None (default)` Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.
Returns:	`X : array of shape [n_samples, n_features]` The generated samples. `Y : array or sparse CSR matrix of shape [n_samples, n_classes]` The label sets. `p_c : array, shape [n_classes]` The probability of each class being drawn. Only returned if `return_distributions=True`. `p_w_c : array, shape [n_features, n_classes]` The probability of each feature being drawn given each class. Only returned if `return_distributions=True`.

n_samples : int, optional (default=100): The number of samples.
n_features : int, optional (default=20): The total number of features.
n_classes : int, optional (default=5): The number of classes of the classification problem.
n_labels : int, optional (default=2): The average number of labels per instance. More precisely, the number of labels per sample is drawn from a Poisson distribution with n_labels as its expected value, but samples are bounded (using rejection sampling) by n_classes, and must be nonzero if allow_unlabeled is False.
length : int, optional (default=50): The sum of the features (number of words if documents) is drawn from a Poisson distribution with this expected value.
allow_unlabeled : bool, optional (default=True): If True, some instances might not belong to any class.
sparse : bool, optional (default=False): If True, return a sparse feature matrix

New in version 0.17: parameter to allow sparse output.
return_indicator : ‘dense’ (default) | ‘sparse’ | False: If dense return Y in the dense binary indicator format. If 'sparse' return Y in the sparse binary indicator format. False returns a list of lists of labels.
return_distributions : bool, optional (default=False): If True, return the prior class probability and conditional probabilities of features given classes, from which the data was drawn.
random_state : int, RandomState instance or None (default): Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See Glossary.

Returns:

X : array of shape [n_samples, n_features]: The generated samples.
Y : array or sparse CSR matrix of shape [n_samples, n_classes]: The label sets.
p_c : array, shape [n_classes]: The probability of each class being drawn. Only returned if return_distributions=True.
p_w_c : array, shape [n_features, n_classes]: The probability of each feature being drawn given each class. Only returned if return_distributions=True.

Examples using `sklearn.datasets.make_multilabel_classification`

../../_images/sphx_glr_plot_multilabel_thumb.png

Multilabel classification

../../_images/sphx_glr_plot_random_multilabel_dataset_thumb.png

Plot randomly generated multilabel dataset

© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_multilabel_classification.html

sklearn.datasets.make_multilabel_classification

Examples using sklearn.datasets.make_multilabel_classification

Examples using `sklearn.datasets.make_multilabel_classification`