python最简单的爬取邮箱地址_python爬取邮箱的实例代码

最新推荐文章于 2024-05-01 09:40:13 发布

weixin_39814126

最新推荐文章于 2024-05-01 09:40:13 发布

阅读量1.3k

点赞数

文章标签： python最简单的爬取邮箱地址

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39814126/article/details/113671474

版权

1 # -*- coding: cp936 -*-

2 import urllib2

3 import re

4 from pyquery import PyQuery as pq

5 from lxml import etree

6

7 #mailpattern = re.compile('[^\._:>\\-][\w\.-]+@(?:[A-Za-z0-9]+\.)+[A-Za-z]+')

8 mailpattern = re.compile('[A-Za-z0-9_]+@(?:[A-Za-z0-9]+\.)+[A-Za-z]+')

9

10 htmlcount = 0 #to count the urls

11 maxcount = 3000 # the max count

12 allUrls = set()

13 allMails = set()

14 UrlsQlist = []

15 UrlsQdict = {}

16 url = "http://www.163.com"

17 fmails = open("E:/py/crawler/mailresult.txt","a")

18 furls = open("E:/py/crawler/urlresult.txt","a")

19

20

21

22

23 def geturls(data):#the function to get the urls in the html

24 urls = set()

25 if data:

26 d = pq(data)

27 label_a = d.find('a')#用pyquery库去找到 a 标签.

28 if label_a:

29 label_a_href = d('a').map(lambda i,e:pq(e)('a').attr('href'))

30 for u in label_a_href:

31 if u[0:10]!="javascript" :

32 if u[0:4] == "http":

33 urls.add(u)

34 else:

35 urls.add(url + u)

36 #for u in urls:

37 #print u

38 return urls

39 else:

40 return None

41

42 def gethtml(url):

43 try:

44 fp = urllib2.urlopen(url)

45 except:

46 print "urllib2.urlopen error"

47 return None

48 else:

49 mybytes =fp.read()

50 fp.close()

51 return mybytes

52

53 def savemails(data): # the function to save the emails

54 if data:

55 mail

最低0.47元/天解锁文章

weixin_39814126

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python最简单的爬取邮箱地址_python爬取邮箱的实例代码

1 # -*- coding: cp936 -*-2 import urllib23 import re4 from pyquery import PyQuery as pq5 from lxml import etree67 #mailpattern = re.compile('[^\._:>\\-][\w\.-]+@(?:[A-Za-z0-9]+\.)+[A-Za-z]+')8...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。