Series.describe(self, percentiles=None, include=None, exclude=None) [source]
Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.
| Parameters: |
|
|---|---|
| Returns: |
|
See also
DataFrame.count
DataFrame.max
DataFrame.min
DataFrame.mean
DataFrame.std
DataFrame.select_dtypes
For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.
For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.
If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.
For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.
The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.
Describing a numeric Series.
>>> s = pd.Series([1, 2, 3]) >>> s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 dtype: float64
Describing a categorical Series.
>>> s = pd.Series(['a', 'a', 'b', 'c']) >>> s.describe() count 4 unique 3 top a freq 2 dtype: object
Describing a timestamp Series.
>>> s = pd.Series([
... np.datetime64("2000-01-01"),
... np.datetime64("2010-01-01"),
... np.datetime64("2010-01-01")
... ])
>>> s.describe()
count 3
unique 2
top 2010-01-01 00:00:00
freq 2
first 2000-01-01 00:00:00
last 2010-01-01 00:00:00
dtype: object
Describing a DataFrame. By default only numeric fields are returned.
>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
... 'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Describing all columns of a DataFrame regardless of data type.
>>> df.describe(include='all')
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN c
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
Describing a column from a DataFrame by accessing it as an attribute.
>>> df.numeric.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Name: numeric, dtype: float64
Including only numeric columns in a DataFrame description.
>>> df.describe(include=[np.number])
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Including only string columns in a DataFrame description.
>>> df.describe(include=[np.object])
object
count 3
unique 3
top c
freq 1
Including only categorical columns from a DataFrame description.
>>> df.describe(include=['category'])
categorical
count 3
unique 3
top f
freq 1
Excluding numeric columns from a DataFrame description.
>>> df.describe(exclude=[np.number])
categorical object
count 3 3
unique 3 3
top f c
freq 1 1
Excluding object columns from a DataFrame description.
>>> df.describe(exclude=[np.object])
categorical numeric
count 3 3.0
unique 3 NaN
top f NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.Series.describe.html