I have been trying to figure this out all day. I am new to Python.
I have a table with about 50,000 records. But the table below will explain what I am trying to do.
I will like to add a third column called Category. This column will contain values based results from the conditions set on the Movies column.
-----------------------------------------
N | Movies
-----------------------------------------
1 | Save the Last Dance
-----------------------------------------
2 | Love and Other Drugs
---------------------------------------
3 | Dance with Me
---------------------------------------
4 | Love Actually
---------------------------------------
5 | High School Musical
----------------------------------------
The condition is this; search through the Movies column for these words {Dance, Love, and Musical). If the word is found in the string, return the word in the Category column.
This will produce a new dataframe like this at the end;
-----------------------------------------
N | Movies | Category
-----------------------------------------
1 | Save the Last Dance | Dance
-----------------------------------------
2 | Love and Other Drugs | Love
---------------------------------------
3 | Dance with Me | Dance
---------------------------------------
4 | Love Actually | Love
---------------------------------------
5 | High School Musical | Musical
----------------------------------------
Thanks in advance!!
解决方案
A faster way would be to create a mask for all your categories, assuming you have a smallish number:
In [22]:
dance_mask = df['Movies'].str.contains('Dance')
love_mask = df['Movies'].str.contains('Love')
musical_mask = df['Movies'].str.contains('Musical')
df[dance_mask]
Out[22]:
N Movies
0 1 Save the Last Dance
2 3 Dance with Me
[2 rows x 2 columns]
In [26]:
# now set category
df.ix[dance_mask,'Category'] = 'Dance'
df
Out[26]:
N Movies Category
0 1 Save the Last Dance Dance
1 2 Love and Other Drugs NaN
2 3 Dance with Me Dance
3 4 Love Actually NaN
4 5 High School Musical NaN
[5 rows x 3 columns]
In [28]:
# repeat for remaining masks
df.ix[love_mask,'Category'] = 'Love'
df.ix[musical_mask,'Category'] = 'Musical'
df
Out[28]:
N Movies Category
0 1 Save the Last Dance Dance
1 2 Love and Other Drugs Love
2 3 Dance with Me Dance
3 4 Love Actually Love
4 5 High School Musical Musical
[5 rows x 3 columns]