Following examples show factorize() as a top-level method like pd.factorize(values). The results are identical
for methods like Series.factorize().

In [1]:
import numpy as np
import pandas as pd
In [2]:
labels, uniques = pd.factorize(['q', 'q', 'p', 'r', 'q'])
labels
Out[2]:
array([0, 0, 1, 2, 0], dtype=int64)
In [3]:
uniques
Out[3]:
array(['q', 'p', 'r'], dtype=object)

If sort=True, the uniques will be sorted, and labels will be shuffled so that the relationship is the maintained.

In [4]:
labels, uniques = pd.factorize(['q', 'q', 'p', 'r', 'q'], sort=True)
labels
Out[4]:
array([1, 1, 0, 2, 1], dtype=int64)
In [5]:
uniques
Out[5]:
array(['p', 'q', 'r'], dtype=object)

Missing values are indicated in labels with na_sentinel (-1 by default) though missing values are never
included in uniques.

In [6]:
labels, uniques = pd.factorize(['q', None, 'p', 'r', 'q'])
labels
Out[6]:
array([ 0, -1,  1,  2,  0], dtype=int64)
In [7]:
uniques
Out[7]:
array(['q', 'p', 'r'], dtype=object)

When factorizing pandas objectsthe type of uniques will differ.
For Categoricals, a Categorical is returned.

In [8]:
cat = pd.Categorical(['p', 'p', 'r'], categories=['p', 'q', 'r'])
In [9]:
labels, uniques = pd.factorize(cat)
In [10]:
labels
Out[10]:
array([0, 0, 1], dtype=int64)
In [11]:
uniques
Out[11]:
[p, r]
Categories (3, object): [p, q, r]

For other pandas objects, an Index of the appropriate type is returned.

In [12]:
cat = pd.Series(['p', 'p', 'r'])
In [13]:
labels, uniques = pd.factorize(cat)
In [14]:
labels
Out[14]:
array([0, 0, 1], dtype=int64)
In [15]:
uniques
Out[15]:
Index(['p', 'r'], dtype='object')