Examples
These examples all show factorize as a top-level method like pd.factorize(values). The results are identical
for methods like Series.factorize().

In :
import numpy as np
import pandas as pd

In :
labels, uniques = pd.factorize(['q', 'q', 'p', 'r', 'q'])
labels

Out:
array([0, 0, 1, 2, 0], dtype=int64)
In :
uniques

Out:
array(['q', 'p', 'r'], dtype=object)

With sort=True, the uniques will be sorted, and labels will be shuffled so that the relationship
is the maintained.

In :
labels, uniques = pd.factorize(['q', 'q', 'p', 'r', 'q'], sort=True)
labels

Out:
array([1, 1, 0, 2, 1], dtype=int64)
In :
uniques

Out:
array(['p', 'q', 'r'], dtype=object)

Missing values are indicated in labels with na_sentinel (-1 by default). Note that missing values
are never included in uniques.

In :
labels, uniques = pd.factorize(['q', None, 'p', 'r', 'q'])
labels

Out:
array([ 0, -1,  1,  2,  0], dtype=int64)
In :
uniques

Out:
array(['q', 'p', 'r'], dtype=object)

Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays).
When factorizing pandas objects, the type of uniques will differ. For Categoricals,
a Categorical is returned.

In :
cat = pd.Categorical(['p', 'p', 'r'], categories=['p', 'q', 'r'])
labels, uniques = pd.factorize(cat)
labels

Out:
array([0, 0, 1], dtype=int64)
In :
uniques

Out:
[p, r]
Categories (3, object): [p, q, r]

Notice that 'q' is in uniques.categories, despite not being present in cat.values.

For all other pandas objects, an Index of the appropriate type is returned.

In :
cat = pd.Series(['p', 'p', 'r'])
labels, uniques = pd.factorize(cat)
labels

Out:
array([0, 0, 1], dtype=int64)
In :
uniques

Out:
Index(['p', 'r'], dtype='object')