Transform between iterable of iterables and a multilabel format.
Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.
Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).
Set to True if output binary array is desired in CSR sparse format.
A copy of the classes parameter when provided. Otherwise it corresponds to the sorted set of classes found when fitting.
See also
OneHotEncoderEncode categorical features using a one-hot aka one-of-K scheme.
>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
[0, 0, 1]])
>>> mlb.classes_
array([1, 2, 3])
>>> mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}])
array([[0, 1, 1],
[1, 0, 0]])
>>> list(mlb.classes_)
['comedy', 'sci-fi', 'thriller']
A common mistake is to pass in a list, which leads to the following issue:
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit(['sci-fi', 'thriller', 'comedy'])
MultiLabelBinarizer()
>>> mlb.classes_
array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't',
'y'], dtype=object)
To correct this, the list of labels should be passed in as:
>>> mlb = MultiLabelBinarizer() >>> mlb.fit([['sci-fi', 'thriller', 'comedy']]) MultiLabelBinarizer() >>> mlb.classes_ array(['comedy', 'sci-fi', 'thriller'], dtype=object)
Fit the label sets binarizer, storing classes_.
A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.
Fitted estimator.
Fit the label sets binarizer and transform the given label sets.
A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.
A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise. Sparse matrix will be of CSR format.
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
A MetadataRequest encapsulating routing information.
Get parameters for this estimator.
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Parameter names mapped to their values.
Transform the given indicator matrix into label sets.
A matrix containing only 1s ands 0s.
The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.
Set output container.
See Introducing the set_output API for an example on how to use the API.
Configure output of transform and fit_transform.
"default": Default output format of a transformer"pandas": DataFrame output"polars": Polars outputNone: Transform configuration is unchangedAdded in version 1.4: "polars" option was added.
Estimator instance.
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Estimator parameters.
Estimator instance.
Transform the given label sets.
A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.
A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise.
© 2007–2025 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/1.6/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html