Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).
Download it if necessary.
Classes | 2 |
Samples total | 13233 |
Dimensionality | 5828 |
Features | real, between 0 and 255 |
In the official README.txt this task is described as the “Restricted” task. As I am not sure as to implement the “Unrestricted” variant correctly, I left it as unsupported for now.
The original images are 250 x 250 pixels, but the default slice and resize arguments reduce them to 62 x 47.
Read more in the User Guide.
Select the dataset to load: ‘train’ for the development training set, ‘test’ for the development test set, and ‘10_folds’ for the official evaluation set that is meant to be used with a 10-folds cross validation.
Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
Download and use the funneled variant of the dataset.
Ratio used to resize the each face picture.
Keep the 3 RGB channels instead of averaging them to a single gray level channel. If color is True the shape of the data has one more dimension than the shape with color = False.
Provide a custom 2D slice (height, width) to extract the ‘interesting’ part of the jpeg files and avoid use statistical correlation from the background.
If False, raise an OSError if the data is not locally available instead of trying to download the data from the source site.
Number of retries when HTTP errors are encountered.
Added in version 1.5.
Number of seconds between retries.
Added in version 1.5.
Bunch
Dictionary-like object, with the following attributes.
subset.
Each row corresponds to 2 ravel’d face images of original size 62 x 47 pixels. Changing the slice_, resize or subset parameters will change the shape of the output.
subset
Each row has 2 face images corresponding to same or different person from the dataset containing 5749 people. Changing the slice_, resize or subset parameters will change the shape of the output.
subset.
Labels associated to each pair of images. The two label values being different persons or the same person.
Explains the target values of the target array. 0 corresponds to “Different person”, 1 corresponds to “same person”.
Description of the Labeled Faces in the Wild (LFW) dataset.
>>> from sklearn.datasets import fetch_lfw_pairs
>>> lfw_pairs_train = fetch_lfw_pairs(subset='train')
>>> list(lfw_pairs_train.target_names)
[np.str_('Different persons'), np.str_('Same person')]
>>> lfw_pairs_train.pairs.shape
(2200, 2, 62, 47)
>>> lfw_pairs_train.data.shape
(2200, 5828)
>>> lfw_pairs_train.target.shape
(2200,)
© 2007–2025 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/1.6/modules/generated/sklearn.datasets.fetch_lfw_pairs.html