1.get_dummies()
pandas.
get_dummies
(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None,sparse=False, drop_first=False):Convert categorical variable into dummy/indicator variables
>>> import pandas as pd
>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
2.pd.factorize()
pandas.
factorize
(values, sort=False, order=None, na_sentinel=-1,size_hint=None):Encode input values as an enumerated type or categorical variable
Series.
factorize
(sort=False, na_sentinel=-1):Encode the object as an enumerated type or categorical variable
Pandas有一个方法叫做factorize(),它可以创建一些数字,来表示类别变量,对每一个类别映射一个ID,这种映射最后只生成一个特征,不像dummy那样生成多个特征。
Parameters: | sort : boolean, default False
na_sentinel: int, default -1
|
---|---|
Returns: | labels : the indexer to the original array uniques : the unique Index |
labels:对应的编码array
uniques:需要编码的类型