I have a column in data frame which ex df:
A
0 Good to 1. Good communication EI : tathagata.kar@ae.com
1 SAP ECC Project System EI: ram.vaddadi@ae.com
2 EI : ravikumar.swarna Role:SSE Minimum Skill
I have a list of of strings
ls=['tathagata.kar@ae.com','a.kar@ae.com']
Now if i want to filter out
for i in range(len(ls)):
df1=df[df['A'].str.contains(ls[i])
if len(df1.columns!=0):
print ls[i]
I get the output
tathagata.kar@ae.com
a.kar@ae.com
But I need only tathagata.kar@ae.com
How Can It be achieved?
As you can see I've tried str.contains But I need something for extact match
解决方案
You could simply use ==
string_a == string_b
It should return True if the two strings are equal. But this does not solve your issue.
Edit 2: You should use len(df1.index) instead of len(df1.columns). Indeed, len(df1.columns) will give you the number of columns, and not the number of rows.
Edit 3: After reading your second post, I've understood your problem. The solution you propose could lead to some errors.
For instance, if you have:
ls=['tathagata.kar@ae.com','a.kar@ae.com', 'tathagata.kar@ae.co']
the first and the third element will match str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i])
And this is an unwanted behaviour.
You could add a check on the end of the string: str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')
Like this:
for i in range(len(ls)):
df1 = df[df['A'].str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')]
if len(df1.index != 0):
print (ls[i])
(Remove parenthesis in the "print" if you use python 2.7)