python爬虫可以爬小说_python爬虫，爬取小说

最新推荐文章于 2024-06-24 18:45:00 发布

weixin_39644139

最新推荐文章于 2024-06-24 18:45:00 发布

阅读量318

点赞数

文章标签： python爬虫可以爬小说

功能：爬取并下载小说中非vip部分的内容。

对于一个有八九年书龄的老书虫而言，遇到想看的小说，却没有找到下载的窗口，每次阅读都需要网上搜索，特别是网不好的地方，是十分不方便的。因此利用python写了爬取小说的代码。

以爬取笔趣阁中的求魔小说为例。打开vs code软件（本人采用vs coede写python），导入数据包。

import requests

import parsel

from lxml import etree

import os获得所有章节的网址。

利用request获得网页内容。

response = requests.get('https://www.biquge.info/10_10142/')

response.encoding = response.apparent_encoding #对网页进行解析，防止网页乱码

利用xpath获得每一个章节的地址。

html = etree.HTML(response.text)

url_s = html.xpath('//*[@id="list"]/dl/dd') #url_s里存放所有章节地址爬取每一个章节内容。

获得要爬取章节的地址。

for url in url_s:

url_one = url.xpath('./a/@href')

print('https://www.booktxt.net/5_5871/' +url_one[0])

download_one_chapter('https://www.booktxt.net/5_5871/' +url_one[0])

对单个章节内容进行爬取。

def download_one_chapter(url):

#爬取一章

response = requests.get(url) #请求网页，获取网页数据

response.encoding = response.apparent_encoding #解决乱码问题万能解码

sel = parsel.Selector(response.text) #将字符串变成网页

#########爬取文章标题###############

h1 = sel.css('h1::text') #css选择器 'h1::text'将对象变为字符串

title = h1.get()

if os.path.exists('txt/' +title +'.txt'):

return

print(title)

#########爬取文章内容

content = sel.css('#content::text')

title = h1.get()

lines = content.getall()

text = ''

for line in lines:

text += line.strip() + '\n'对每一章的内容进行保存。

建立txt文件夹，每一章内容保存在该文件夹中。

with open('txt/' +title +'.txt','w',encoding = 'utf-8') as f:

f.write(title)

f.write(text)

weixin_39644139

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python爬虫可以爬小说_python爬虫，爬取小说

功能：爬取并下载小说中非vip部分的内容。对于一个有八九年书龄的老书虫而言，遇到想看的小说，却没有找到下载的窗口，每次阅读都需要网上搜索，特别是网不好的地方，是十分不方便的。因此利用python写了爬取小说的代码。以爬取笔趣阁中的求魔小说为例。打开vs code软件（本人采用vs coede写python），导入数据包。import requestsimport parselfrom lxml i...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。