Python 爬虫练习(三) 利用百度进行子域名收集

最新推荐文章于 2023-03-05 21:29:05 发布

weixin_30500105

最新推荐文章于 2023-03-05 21:29:05 发布

阅读量402

点赞数

文章标签： python 爬虫

原文链接：http://www.cnblogs.com/i-honey/p/7887416.html

版权

不多介绍了，千篇一律的正则匹配.....

import requests
import re


head = {'User-Agent': \
            'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36'}
key = 'jcrcw.com'  # 这里填主域名
lst = []

match = 'style="text-decoration:none;">(.*?)</b>'

for i in range(1, 20):  # 1-19页
    url = "https://www.baidu.com/s?wd=inurl:{}&pn={}&oq={}&ie=utf-8".format(key, i, key)
    print(url)
    # response = requests.get(url,headers=head,cookies = cook).content
    response = requests.get(url, headers=head).content
    subdomains = re.findall(match, response.decode())
    for j in subdomains:
        j = j.replace('<b>', '')
        if key in j:
            if j not in lst:
                lst.append(j)
                # print(lst)
print(lst)

　　运行结果：

转载于:https://www.cnblogs.com/i-honey/p/7887416.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30500105

关注关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python 爬虫练习(三) 利用百度进行子域名收集

不多介绍了，千篇一律的正则匹配.....import requestsimport rehead = {'User-Agent': \ 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safa...
复制链接

扫一扫