DataFrame.boxplot(self, column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, **kwds)
[source]
Make a box plot from DataFrame columns.
Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. The position of the whiskers is set by default to 1.5 * IQR (IQR = Q3 - Q1)
from the edges of the box. Outlier points are those past the end of the whiskers.
For further details see Wikipedia’s entry for boxplot.
Parameters: |
|
---|---|
Returns: |
|
See also
Series.plot.hist
matplotlib.pyplot.boxplot
The return type depends on the return_type
parameter:
For data grouped with by
, return a Series of the above or a numpy array:
Series
array
(for return_type = None
)Use return_type='dict'
when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.
Boxplots can be created for every column in the dataframe by df.boxplot()
or indicating the columns to be used:
>>> np.random.seed(1234) >>> df = pd.DataFrame(np.random.randn(10,4), ... columns=['Col1', 'Col2', 'Col3', 'Col4']) >>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3'])
Boxplots of variables distributions grouped by the values of a third variable can be created using the option by
. For instance:
>>> df = pd.DataFrame(np.random.randn(10, 2), ... columns=['Col1', 'Col2']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', ... 'B', 'B', 'B', 'B', 'B']) >>> boxplot = df.boxplot(by='X')
A list of strings (i.e. ['X', 'Y']
) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:
>>> df = pd.DataFrame(np.random.randn(10,3), ... columns=['Col1', 'Col2', 'Col3']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', ... 'B', 'B', 'B', 'B', 'B']) >>> df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A', ... 'B', 'A', 'B', 'A', 'B']) >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
The layout of boxplot can be adjusted giving a tuple to layout
:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', ... layout=(2, 1))
Additional formatting can be done to the boxplot, like suppressing the grid (grid=False
), rotating the labels in the x-axis (i.e. rot=45
) or changing the fontsize (i.e. fontsize=15
):
>>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15)
The parameter return_type
can be used to select the type of element returned by boxplot
. When return_type='axes'
is selected, the matplotlib axes on which the boxplot is drawn are returned:
>>> boxplot = df.boxplot(column=['Col1','Col2'], return_type='axes') >>> type(boxplot) <class 'matplotlib.axes._subplots.AxesSubplot'>
When grouping with by
, a Series mapping columns to return_type
is returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', ... return_type='axes') >>> type(boxplot) <class 'pandas.core.series.Series'>
If return_type
is None
, a NumPy array of axes with the same shape as layout
is returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', ... return_type=None) >>> type(boxplot) <class 'numpy.ndarray'>
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.DataFrame.boxplot.html