【脚本】网页端微信读书书架中书籍详细信息

挖个洞先

已于 2023-06-02 18:34:46 修改

阅读量1.6k

点赞数 3

分类专栏：脚本文章标签：微信 python 开发语言

于 2022-05-14 14:26:24 首次发布

本文链接：https://blog.csdn.net/qq_45196785/article/details/124767864

版权

脚本专栏收录该内容

8 篇文章

订阅专栏

首先进入书架中的一个分组
在这里插入图片描述
然后在控制台输入如下代码

// 书架中的书籍链接
let shelf_arr = []; // 存储链接
let shelf_book = document.getElementsByClassName("shelfBook"); // 书籍
for (i = 0; i < shelf_book.length; i++) {
  shelf_arr.push(shelf_book[i].href.replace("reader", "bookDetail"));
}
console.log(shelf_arr);

在这里插入图片描述

// 排行榜书籍
let shelf_arr = []; // 存储链接
let shelf_book = document.getElementsByClassName("wr_bookList_item_link"); // 书籍
for (i = 0; i < shelf_book.length; i++) {
  shelf_arr.push(shelf_book[i].href);
}
console.log(shelf_arr);

在这里插入图片描述

复制到py脚本的shelf_arr数组中（需要安装requests和BeautifulSoup库）

# -*-coding:utf-8 -*-
import sys
import os
from bs4 import BeautifulSoup
import requests
import io
# 改变标准输出的默认编码
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf8')
# 爬取微信读书书架书籍字数
shelf_arr = [
    "https://weread.qq.com/web/bookDetail/6613268071ae9e6f661ddee",
    "https://weread.qq.com/web/bookDetail/43f327705a48fc43feb9160",
    "https://weread.qq.com/web/bookDetail/e77320c071593303e779e6c",
]  # 分组所有书籍
# 将要输出保存的文件地址（自行更改），若文件不存在，则会自动创建
fw = open("./文本.txt", 'w', encoding='utf-8')
for link in shelf_arr:
    req = requests.get(url=link)
    req.encoding = "utf-8"
    html = req.text
    soup = BeautifulSoup(req.text, features="html.parser")
    book_titles = soup.find_all(
        "h2", class_="bookInfo_right_header_title_text")  # 书名
    book_authors = soup.find_all(
        "a", class_="bookInfo_author")  # 作者
    book_nums = soup.find_all("div", "introDialog_content_pub_line")  # 字数
    for book_title in book_titles:
        book_title_handle = "书名: " + book_title.text.strip()
        fw.write(book_title_handle)
        fw.write("\n")
    for book_author in book_authors:
        book_author_handle = "作者: " + book_author.text.strip()
        fw.write(book_author_handle)
        fw.write("\n")
    for book_num in book_nums:
        book_num_handles = book_num.find_all("span")
        for book_num_handle in book_num_handles:
            book_num_handle = book_num_handle.text.strip()
            fw.write(book_num_handle)
            fw.write("\n")
    fw.write("\n")