Return boolean Series denoting duplicate rows.
Considering certain columns is optional.
Only consider certain columns for identifying duplicates, by default use all of the columns.
Determines which duplicates (if any) to mark.
first : Mark duplicates as True except for the first occurrence.
last : Mark duplicates as True except for the last occurrence.
False : Mark all duplicates as True.
Boolean series for each duplicated rows.
See also
Index.duplicatedEquivalent method on index.
Series.duplicatedEquivalent method on Series.
Series.drop_duplicatesRemove duplicate values from Series.
DataFrame.drop_duplicatesRemove duplicate values from DataFrame.
Examples
Consider dataset containing ramen rating.
>>> df = pd.DataFrame({
... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
... 'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
By default, for each set of duplicated values, the first occurrence is set on False and all others on True.
>>> df.duplicated()
0 False
1 True
2 False
3 False
4 False
dtype: bool
By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.
>>> df.duplicated(keep='last')
0 True
1 False
2 False
3 False
4 False
dtype: bool
By setting keep on False, all duplicates are True.
>>> df.duplicated(keep=False)
0 True
1 True
2 False
3 False
4 False
dtype: bool
To find duplicates on specific column(s), use subset.
>>> df.duplicated(subset=['brand'])
0 False
1 True
2 False
3 True
4 True
dtype: bool
© 2008–2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
© 2011–2025, Open source contributors
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/2.3.0/reference/api/pandas.DataFrame.duplicated.html