Pandas: Data Manipulation - factorize() function

factorize() function

The factorize() function is used to encode the object as an enumerated type or categorical variable. This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values.


pandas.factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None)


Name Description Type Default Value Required / Optional
values A 1-D sequence. Sequences that aren’t pandas objects are coerced to ndarrays before factorization. sequence   Required
prefix Sort uniques and shuffle labels to maintain the relationship. bool Default: False Optional
na_sentinel Value to mark “not found”. int Default:1 Optional
size_hint Hint to the hashtable sizer. int   Optional

Returns: labels: ndarray - An integer ndarray that’s an indexer into uniques. uniques.take(labels) will have the same values as values.
uniques: ndarray, Index, or Categorical - The unique valid values.
When values is Categorical, uniques is a Categorical.
When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.

Note: Even if there’s a missing value in values, uniques will not contain an entry for it.


Download the above Notebook from here.

Previous: get_dummies() function
Next: unique() function