Python爬取wfxnews 小说网站，实现批量下载小说

最新推荐文章于 2021-03-09 23:45:14 发布

void1024

最新推荐文章于 2021-03-09 23:45:14 发布

阅读量1.1k

点赞数

分类专栏： python 爬虫文章标签：批量下载小说 Python爬虫

本文链接：https://blog.csdn.net/diOSyu/article/details/97274285

版权

python 同时被 2 个专栏收录

26 篇文章 0 订阅

订阅专栏

爬虫

6 篇文章 0 订阅

订阅专栏

1. 小说网站为

https://m.wfxnews.com/

2. 分析网页结构

下载小说的API如下：

https://www.wfxnews.com/modules/article/txtarticle.php?id=112451

通过以下网址，可获得书籍信息

https://m.wfxnews.com/book/112451.shtml

112451为这本小说的ID。 ID最小为1，最大为199959。

遍历加多线程完成小说资源下载。

3. 源代码

# -*- coding:utf-8 -*-

import requests
import time
import os
from threading import Thread


def get_one_page(_url):
    response = requests.get(_url)
    response.raise_for_status()
    response.encoding = 'gbk'
    return response.text


def get_txt_save_path(_url):
    idx = _url.rindex('=')
    return 'txt/' + _url[idx + 1:] + '.txt'


def save_one_txt(_url):
    txt_save_path = get_txt_save_path(_url)
    # 下载TXT
    one_txt = get_one_page(_url)
    with open(txt_save_path, 'w') as f:
        f.write(one_txt)


website = {
    'last_id': 199959,
    'first_id': 1,
    'download_prefix': 'https://www.wfxnews.com/modules/article/txtarticle.php?id=',
    'info_prefix': 'https://m.wfxnews.com/book/',
    'info_suffix': '.shtml'
}

first_id = website['first_id']
last_id = website['last_id']
download_prefix = website['download_prefix']

cnt = 0
queue = []
for i in range(first_id, last_id + 1):
    # 判断TXT是否已经下载
    if os.path.exists('txt/' + str(i) + '.txt'):
        continue

    url = download_prefix + str(i)

    try:
        th = Thread(target=save_one_txt, args=(url,))
        th.start()
        queue.append(th)
        
        cnt += 1
        if cnt % 16 == 0:
            for q in queue:

                q.join()
            queue =  []
    except:
        print(url)
    print('cnt =', cnt)

４. 运行结果

void1024

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python爬取wfxnews 小说网站，实现批量下载小说

1. 小说网站为https://m.wfxnews.com/2. 分析网页结构下载小说的API如下：https://www.wfxnews.com/modules/article/txtarticle.php?id=112451通过以下网址，可获得书籍信息https://m.wfxnews.com/book/112451.shtml112451为这本小说...
复制链接

扫一扫