爬虫学习第七天

最新推荐文章于 2022-01-17 22:37:24 发布

ChinaGeographer

最新推荐文章于 2022-01-17 22:37:24 发布

阅读量131

点赞数

分类专栏： python爬虫学习

本文链接：https://blog.csdn.net/weixin_45547832/article/details/100120351

版权

python爬虫学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

爬虫学习第七天

糗事百科案例
用到的模块：
re requests fake_UserAgent
思路：先找到网页然后进行匹配需要的东西，
关键点在于：找到所需要的东西的位置代码如下：

infos = re.findall(r'<div class="content">\s*<span>\s*(.+)\s*</span>',info)

注：
\s、\n\n\n*：是换行
最后储存代码有点搞不懂，反正直接用就行了吧，

完整代码如下：

import re
from fake_useragent import UserAgent
import requests

url = "https://www.qiushibaike.com/text/"

headers = {
    "User-Agent":UserAgent().random
}
#构造请求
response = requests.get(url,headers=headers)
info = response.text

infos = re.findall(r'<div class="content">\s*<span>\s*(.+)\s*</span>',info)
with open('duanzi.txt','w',encoding="utf-8") as f:
    for info in infos:
        f.write(info + "\n\n\n")