Return an array and missing value suitable for factorization.
An array suitable for factorization. This should maintain order and be a supported dtype (Float64, Int64, UInt64, String, Object). By default, the extension array is cast to object dtype.
The value in values to consider missing. This will be treated as NA in the factorization routines, so it will be coded as -1 and not included in uniques. By default, np.nan is used.
Notes
The values returned by this method are also used in pandas.util.hash_pandas_object(). If needed, this can be overridden in the self._hash_pandas_object() method.
Examples
>>> pd.array([1, 2, 3])._values_for_factorize()
(array([1, 2, 3], dtype=object), nan)
© 2008–2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
© 2011–2025, Open source contributors
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/2.3.0/reference/api/pandas.api.extensions.ExtensionArray._values_for_factorize.html