import pandas as pd
df = pd.DataFrame({'A':['a','b','a'],'B':['c','d','e'],'C':[1,2,3]})
In [6]: df
Out[6]:
A B C
0 a c 1
1 b d 2
2 a e 3
In [7]: pd.get_dummies(df)
Out[7]:
C A_a A_b B_c B_d B_e
0 1 1 0 1 0 0
1 2 0 1 0 1 0
2 3 1 0 0 0 1
In [8]: pd.get_dummies(df,prefix=['col1','col2'])
Out[8]:
C col1_a col1_b col2_c col2_d col2_e
0 1 1 0 1 0 0
1 2 0 1 0 1 0
2 3 1 0 0 0 1
从上面我们会发现在DataFrame中数字部分不会进行one-hot编码
df1 = pd.Series([1,2,3])
In [10]: df1
Out[10]:
0 1
1 2
2 3
dtype: int64
In [11]: pd.get_dummies(df1)
Out[11]:
1 2 3
0 1 0 0
1 0 1 0
2 0 0 1
但是对于Series数字部分会进行one-hot编码