官网地址https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html
pandas.factorize
将Series中的相同的标称型映射为相同的index
pandas.
factorize
(values, sort=False, na_sentinel=- 1, size_hint=None, dropna=True)[source]
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values. factorize is available as both a top-level function pandas.factorize()
, and as a method Series.factorize()
and Index.factorize()
.
Parameters
valuessequence
A 1-D sequence. Sequences that aren’t pandas objects are coerced to ndarrays before factorization.
sortbool, default False
Sort uniques and shuffle codes to maintain the relationship.
na_sentinelint, default -1
Value to mark “not found”.
size_hintint, optional
Hint to the hashtable sizer.
Returns
codesndarray
An integer ndarray that’s an indexer into uniques. uniques.take(codes)
will have the same values as values.
uniquesndarray, Index, or Categorical
The unique valid values. When values is Categorical, uniques is a Categorical. When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.
Note
Even if there’s a missing value in values, uniques will not contain an entry for it.
See also
Discretize continuous-valued array.
Find the unique value in an array.