2021-08-22

最新推荐文章于 2024-07-28 15:46:11 发布

selnna

最新推荐文章于 2024-07-28 15:46:11 发布

阅读量139

点赞数

文章标签： python

本文链接：https://blog.csdn.net/selnna/article/details/119856518

版权

今天使用的是异步爬取西游记每一个章节的内容。思维逻辑是1.同步爬取西游记每个章节的标题和对应编号2.然后根据每个章节的编号异步爬取章节介绍内容。

#https://dushu.baidu.com/api/pc/getCatalog?data={%22book_id%22:%224306063500%22}
#{title: "第一回 灵根育孕源流出 心性修持大道生", price_status: "0", cid: "11348571"}
#https://dushu.baidu.com/api/pc/getChapterContent?data={"book_id":"%224306063500","cid":"4306063500|11348571","need_bookinfo":1}
import json
import aiofiles
import requests
import aiohttp
import asyncio

"""
1.同步操作：访问getCatalog拿到所有章节的cid和名称
2.异步操作：访问getChapterContent下载所有的文章内容
"""


async def aiodownload(title,b_id,cid):
    try:
        data={
            "book_id": b_id,
            "cid":f"{b_id}|{cid}",
            "need_bookinfo": 1
        }#我需要把json格式转换成字符串形式
        data=json.dumps(data)
        url=f"https://dushu.baidu.com/api/pc/getChapterContent?data={data}"
        async with aiohttp.ClientSession()as session:
            async with session.get(url) as resp:
                dic=await resp.json()

                async with aiofiles.open(title,mode="a",encoding="utf-8")as f:
                    await f.write(dic['data']['novel']['content'])#把小说内容写出
    except:
        aiodownload(title,b_id,cid)




async def getCatalog(url):
    try:
        resp=requests.get(url)
        #print(resp.json())
        dic=resp.json()
        tasks=[]
        for item in dic['data']['novel']['items']:
            title=item['title']
            cid=item['cid']
            #print(cid,title)
            #准备异步任务
            tasks.append(aiodownload(title,b_id,cid))

        await asyncio.wait(tasks)
    except:
        getCatalog(url)



if __name__ == '__main__':
    b_id="4306063500"
    url='https://dushu.baidu.com/api/pc/getCatalog?data={"book_id":"'+b_id+'"}'
    loop=asyncio.get_event_loop()
    loop.run_until_complete(getCatalog(url))
    #asyncio.run(getCatalog(url))

学习内容：
1.首先会异步将内容存储在文件里。import aiofiles

async with aiofiles.open(title,mode="a",encoding="utf-8")as f: await f.write(dic['data']['novel']['content'])#把小说内容写出
2.有json格式转成字符串data=json.dumps(data)
3.出现错误：ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。查了资料说是访问太过于频繁造成的。（怎么改参考文献已给出，但是我还不会，因为他那个是在同步的情况下，异步，我不知道怎么改，后边也许会了再返回来改）
在这里插入图片描述

https://blog.csdn.net/illegalname/article/details/77164521
https://blog.csdn.net/qq_40910788/article/details/84844464

selnna

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2021-08-22

今天使用的是异步爬取西游记每一个章节的内容。思维逻辑是1.同步爬取西游记每个章节的标题和对应编号2.然后根据每个章节的编号异步爬取章节介绍内容。#https://dushu.baidu.com/api/pc/getCatalog?data={%22book_id%22:%224306063500%22}#{title: "第一回灵根育孕源流出心性修持大道生", price_status: "0", cid: "11348571"}#https://dushu.baidu.com/api/pc/ge
复制链接

扫一扫