Discretize into three equal-sized bins:

In [19]:
import numpy as np
import pandas as pd
In [20]:
pd.cut(np.array([2, 7, 5, 4, 6, 8]), 3)
# doctest: +ELLIPSIS
Out[20]:
[(1.994, 4.0], (6.0, 8.0], (4.0, 6.0], (1.994, 4.0], (4.0, 6.0], (6.0, 8.0]]
Categories (3, interval[float64]): [(1.994, 4.0] < (4.0, 6.0] < (6.0, 8.0]]
In [21]:
pd.cut(np.array([2, 7, 5, 4, 6, 8]), 3, retbins=True)
# doctest: +ELLIPSIS
Out[21]:
([(1.994, 4.0], (6.0, 8.0], (4.0, 6.0], (1.994, 4.0], (4.0, 6.0], (6.0, 8.0]]
 Categories (3, interval[float64]): [(1.994, 4.0] < (4.0, 6.0] < (6.0, 8.0]],
 array([1.994, 4.   , 6.   , 8.   ]))

Discovers the same bins, but assign them specific labels. Notice that the returned Categorical’s
categories are labels and is ordered.

In [22]:
pd.cut(np.array([2, 7, 5, 4, 6, 8]),
       3, labels=["small", "big", "large"])
Out[22]:
[small, large, big, small, big, large]
Categories (3, object): [small < big < large]

labels=False implies you just want the bins back.

In [23]:
pd.cut([0, 1, 1, 2], bins=4, labels=False)
Out[23]:
array([0, 1, 1, 3], dtype=int64)

Passing a Series as an input returns a Series with categorical dtype:

In [24]:
s = pd.Series(np.array([4, 5, 6, 8, 10]),
              index=['p', 'q', 'r', 's', 't'])
In [25]:
pd.cut(s, 3)
# doctest: +ELLIPSIS
Out[25]:
p    (3.994, 6.0]
q    (3.994, 6.0]
r    (3.994, 6.0]
s      (6.0, 8.0]
t     (8.0, 10.0]
dtype: category
Categories (3, interval[float64]): [(3.994, 6.0] < (6.0, 8.0] < (8.0, 10.0]]

Passing a Series as an input returns a Series with mapping value. It is used to map numerically to intervals
based on bins.

In [26]:
s = pd.Series(np.array([4, 5, 6, 8, 10]),
              index=['p', 'q', 'r', 's', 't'])
In [27]:
pd.cut(s, [0, 4, 5, 6, 8, 10], labels=False, retbins=True, right=False)
# doctest: +ELLIPSIS
Out[27]:
(p    1.0
 q    2.0
 r    3.0
 s    4.0
 t    NaN
 dtype: float64, array([ 0,  4,  5,  6,  8, 10]))

Use drop optional when bins is not unique

In [28]:
pd.cut(s, [0, 4, 5, 6, 10, 10], labels=False, retbins=True,
       right=False, duplicates='drop')
# doctest: +ELLIPSIS
Out[28]:
(p    1.0
 q    2.0
 r    3.0
 s    3.0
 t    NaN
 dtype: float64, array([ 0,  4,  5,  6, 10], dtype=int64))

Passing an IntervalIndex for bins results in those categories exactly. Notice that values not covered
by the IntervalIndex are set to NaN. 0 is to the left of the first bin (which is closed on the right),
and 1.5 falls between two bins.

In [29]:
bins = pd.IntervalIndex.from_tuples([(0, 2), (3, 4), (5, 6)])
In [31]:
pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
Out[31]:
[NaN, (0.0, 2.0], (0.0, 2.0], NaN, NaN]
Categories (3, interval[int64]): [(0, 2] < (3, 4] < (5, 6]]