Python爬取小说网站总推荐排行榜所有小说~

最新推荐文章于 2024-08-22 16:29:39 发布

嗨学编程

最新推荐文章于 2024-08-22 16:29:39 发布

阅读量2.9k

点赞数 4

分类专栏： Python爬虫文章标签： python

本文链接：https://blog.csdn.net/fei347795790/article/details/108939219

版权

Python爬虫专栏收录该内容

677 篇文章 327 订阅

订阅专栏

文章目录

前言
一、相关环境配置
二、使用步骤
总结

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理。

PS：如有需要Python学习资料的小伙伴可以加点击下方链接自行获取

python免费学习资料以及群交流解答点击即可加入

提示：以下是本篇文章正文内容，下面案例可供参考

一、相关环境配置

python 3.6
pycharm
requests
parsel

二、使用步骤

1.引入库

代码如下（示例）：

import requests
import parsel

2.获取网页数据

代码如下（示例）：

url = 'https://www.tianyabook.com/top/allvote/'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
response.encoding = response.apparent_encoding

运行返回结果：

在这里插入图片描述

3.解析数据

代码如下（示例）：

selector = parsel.Selector(response.text)
urls = selector.css('table tr td:nth-child(1) a::attr(href)').getall()
titles = selector.css('table tr td:nth-child(1) a::attr(title)').getall()
data = zip(urls, titles)
for i in data:
    book_id = i[0].replace('.html', '').split('/')[-1]
    title = i[1]
    print(book_id, title)

运行返回结果：

在这里插入图片描述

4.保存数据

def download(title, book_id):
    filename = 'D:\\python\\demo\\电子书下载\\小说\\' + title + '.txt'
    download_url = 'http://www.tianyabook.com/modules/article/txtarticle.php?id={}'.format(book_id)
    response_2 = requests.get(url=download_url, headers=headers)
    with open(filename, mode='a', encoding='utf-8') as f:
        f.write(response_2.text)