python 手机号码识别_python3库可以识别电话号码、姓名、电子邮件和地址-CSDN博客

本文链接：https://blog.csdn.net/weixin_40003283/article/details/112930446

本文探讨了如何在Python中从非结构化的文本中提取电话号码、标题、姓名、电子邮件和社交媒体链接。提到nltk库可能不适用此场景，因为文本是以字符而非单词为单位。提出了考虑使用图像识别库来处理链接，再转换回文本进行信息提取的方法。

摘要由CSDN通过智能技术生成

假设我成功地获取了这条文本，然后将它们指定为textToModify：textToModify = "

abcde abcde

Title: Director, lorem company

Phone: 123.647.4555

Mobile: 123.123.1234 E-mail: try1@umich.edu Assistant: my name Assistant Phone: 667.889.9910

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

Linkedin: www.linkedin.com/in/lorem-ipsum/

Twitter: www.twitter.com/ipsum

现在我想从这篇文章中提取标题、姓名、电话号码、linkedin、twitter和其他重要信息。有这样的图书馆吗？或者你有什么想法吗？假设这个文本的格式是随机的，但是单词title总是紧挨着标题本身，单词phone总是紧挨着phone等等

我最初的想法是：

nltk库无法工作，因为它基本上是用标识符分配单词，问题是，这个文本不是按单词分隔的，而是字符，例如，如果访问textToModify[20]，它只会返回一个字符。

我的另一个想法是，如果我访问链接，然后截图，然后使用python中的picture到文本库，然后从那里开始呢

谢谢你！在