Python3爬虫使用re库

最新推荐文章于 2021-02-21 06:11:25 发布

1eeMamas

最新推荐文章于 2021-02-21 06:11:25 发布

阅读量291

点赞数

分类专栏： python爬虫

本文链接：https://blog.csdn.net/kkLeung/article/details/105437620

版权

python爬虫专栏收录该内容

7 篇文章 0 订阅

订阅专栏

爬取糗事百科实例

import requests
from fake_useragent import UserAgent
import re

headers = {
    'User-Agent': UserAgent().chrome
}
url = 'https://www.qiushibaike.com/text/'

response = requests.get(url, headers=headers)
info = response.text

infos = re.findall(r'<div class="content">\s*<span>\s*(.+)\s*</span>', info)
with open('duanzi.txt', 'w', encoding='utf-8') as f:
    for info in infos:
        f.write(info + '\n\n\n')