python 爬取网页小说并保存成txt文件

最新推荐文章于 2024-08-07 09:00:00 发布

qq_34231078

最新推荐文章于 2024-08-07 09:00:00 发布

阅读量8.9k

点赞数 6

文章标签： python

本文链接：https://blog.csdn.net/qq_34231078/article/details/106433388

版权

本文介绍如何使用Python编写简单爬虫，从笔下文学网站抓取小说内容，并将其保存为TXT文件。提供了一个爬取《元尊》小说的实例代码。

摘要由CSDN通过智能技术生成

平时喜欢看小说自己写的简单爬虫

利用python来爬取网页上的小说（笔下文学的）
这是爬取并保存的元尊小说的txt文件
在这里插入图片描述
代码如下：


import urllib.request
import re
import gzip
from io import BytesIO
from bs4 import BeautifulSoup

// //打开链接
def urlopen(url):
    req = urllib.request.Request(url)
    req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36")
    req.add_header("Accept-Encoding","gzip")
    html = urllib.request.urlopen(req)
    html = html.read()
    buff = BytesIO(html)
    f = gzip.GzipFile(fileobj=buff)
    html = f.read().decode('utf-8')
    return html

// 获取小说名称
def txt_name(url):
    html = urlopen(url)
    htm = BeautifulSoup(h