I have a df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
Kumar Kumar is a keeper
and a list,
my_list=["one","good","ravi","ball"]
I am trying to get the rows which are having atleast one keyword from my_list.
I tried,
mask=df["Description"].str.contains("|".join(my_list),na=False)
I am getting the output_df,
Name Description
Ram Ram is one of ONe crickete
Sri Sri is one of the member
Ravi Ravi is a player, ravi is playing
Kumar there is a BALL
I also want to add the keywords present in the "Description" and its counts in a separate columns,
My desired output is,
Name Description pre-keys keys count
Ram Ram is one of ONe crickete one,good,ONe one,good 2
Sri Sri is one of the member one one 1
Ravi Ravi is a player, ravi is playing Ravi,ravi ravi 1
Kumar there is a BALL ball ball 1
解决方案extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')')
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
EDIT:
import re
my_list=["ONE","good"]
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1