I want to find valid email addresses in a text file, and this is my code:
email = re.findall(r'[a-zA-Z\.-]+@[\w\.-]+',line)
But my code obviously does not contain email addresses where there are numbers before @ sign. And my code could not handle email addresses that do not have valid ending. So could anyone help me with these two problems? Thank you!
An example of my problem would be:
my code can find this email: xyz@gmail.com
but it cannot find this one: xyz123@gmail.com
And it cannot filter this email out either: xyz@gmail
解决方案
From the python re docs, \w matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]. So [\w\.-] will appropriately match numbers as well as characters.
email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)
This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.