I have a dataframe like this,
col1 col2 col3
1 apple a,b
2 car c
3 dog a,c
4 dog NaN
I tried to create three new columns, a,b and c, which give '1' if it contains a specific string, otherwise, '0'.
df['a']= np.where(df['col3'].str.contains('a'),1,0)
df['b']= np.where(df['col3'].str.contains('b'),1,0)
df['c']= np.where(df['col3'].str.contains('c'),1,0)
But it seems NaN values were not handled correctly. It gives me a result like,
col1 col2 col3 a b c
1 apple a,b 1 1 0
2 car c 0 0 1
3 dog a,c 1 0 1
4 dog NaN 1 1 1
It should be all '0's in the 4th row. How can I change my code to get the right answer?
解决方案
What I will do
s=df.col2.str.get_dummies(sep=',')
Out[29]:
a b c
0 1 1 0
1 0 0 1
2 1 0 1
3 0 0 0
df=pd.concat([df,s],axis=1)