CSDN质量分分页查询操作及代码

质量分查询网址

质量分查询网址

质量分请求api

POST请求地址:https://bizapi.csdn.net/trends/api/v1/get-article-score

参考网址:【python】我用python写了一个可以批量查询文章质量分的小项目(纯python、flask+html、打包成exe文件)
参考网址:如何批量查询自己的CSDN博客质量分

响应标头信息:

以查询:https://blog.csdn.net/Medlar_CN/article/details/132229859为例:
在这里插入图片描述
浏览器按F12,查看网页信息,选择网络选项卡,会看到以下响应标头信息:在这里插入图片描述
主要信息如下:
authority:
bizapi.csdn.net
:method:
POST
:path:
/trends/api/v1/get-article-score
:scheme:
https
Accept:
application/json, text/plain, /
Accept-Encoding:
gzip, deflate, br
Accept-Language:
zh-CN,zh;q=0.9
Content-Length:
191
Content-Type:
multipart/form-data; boundary=----WebKitFormBoundaryCfQFnl0pJrpx5ZKk
Cookie:省略
Origin:
https://www.csdn.net
Referer:
https://www.csdn.net/qc
Sec-Ch-Ua:
“Not/A)Brand”;v=“99”, “Google Chrome”;v=“115”, “Chromium”;v=“115”
Sec-Ch-Ua-Mobile:
?0
Sec-Ch-Ua-Platform:
“Windows”
Sec-Fetch-Dest:
empty
Sec-Fetch-Mode:
cors
Sec-Fetch-Site:
same-site
User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36
X-Ca-Key:省略
X-Ca-Nonce:省略
X-Ca-Signature:省略
X-Ca-Signature-Headers:
x-ca-key,x-ca-nonce
X-Ca-Signed-Content-Type:
multipart/form-data

批量获取质量分代码(按参考代码修改)

import urllib.request
import json
import pandas as pd
from openpyxl import Workbook, load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import math

# 批量获取文章信息并保存到excel
class CSDNArticleExporter:
    def __init__(self, username, page, size, filename):
        self.username = username
        self.size = size
        self.filename = filename
        self.page = page

    def get_articles(self):
        url = f"https://blog.csdn.net/community/home-api/v1/get-business-list?page={self.page}&size={self.size}&businessType=blog&orderby=&noMore=false&year=&month=&username={self.username}"
        with urllib.request.urlopen(url) as response:
            data = json.loads(response.read().decode())
        return data['data']['list']

    def export_to_excel(self):
        df = pd.DataFrame(self.get_articles())
        df = df[['title', 'url', 'postTime', 'viewCount', 'collectCount', 'diggCount', 'commentCount']]
        df.columns = ['文章标题', 'URL', '发布时间', '阅读量', '收藏量', '点赞量', '评论量']
        # df.to_excel(self.filename)
        # 下面的代码会让excel每列都是合适的列宽,如达到最佳阅读效果
        # 你只用上面的保存也是可以的
        # Create a new workbook and select the active sheet
        wb = Workbook()
        sheet = wb.active
        # Write DataFrame to sheet
        for r in dataframe_to_rows(df, index=False, header=True):
            sheet.append(r)
        # Iterate over the columns and set column width to the max length in each column
        for column in sheet.columns:
            max_length = 0
            column = [cell for cell in column]
            for cell in column:
                try:
                    if len(str(cell.value)) > max_length:
                        max_length = len(cell.value)
                except:
                    pass
            adjusted_width = (max_length + 5)
            sheet.column_dimensions[column[0].column_letter].width = adjusted_width
        # Save the workbook
        wb.save(self.filename)


# 批量查询质量分
class ArticleScores:
    def __init__(self, filepath):
        self.filepath = filepath

    @staticmethod
    def get_article_score(article_url):
        url = "https://bizapi.csdn.net/trends/api/v1/get-article-score"
        headers = {
            "Accept": "application/json, text/plain, */*",
            "X-Ca-Key": "填自己的",
            "X-Ca-Nonce": "填自己的",
            "X-Ca-Signature": "填自己的",
            "X-Ca-Signature-Headers": "x-ca-key,x-ca-nonce",
            "X-Ca-Signed-Content-Type": "multipart/form-data",
        }
        data = urllib.parse.urlencode({"url": article_url}).encode()
        req = urllib.request.Request(url, data=data, headers=headers)
        with urllib.request.urlopen(req) as response:
            return json.loads(response.read().decode())['data']['score']

    def get_scores_from_excel(self):
        # Read the Excel file
        df = pd.read_excel(self.filepath)
        # Get the 'URL' column
        urls = df['URL']
        # Get the score for each URL
        scores = [self.get_article_score(url) for url in urls]
        return scores

    def write_scores_to_excel(self):
        df = pd.read_excel(self.filepath)
        df['质量分'] = self.get_scores_from_excel()
        df.to_excel(self.filepath,index=False)


if __name__ == '__main__':
    total = 212     #已发文章总数量
    t_index = math.ceil(total/100)+1 #向上取整,半闭半开区间,开区间+1。
    # 获取文章信息
    # CSDNArticleExporter("待查询用户名", 2(分页数量,按总文章数量/100所得的分页数),总文章数量仅为设置为全部可见的文章总数。
    # 100(最大单次查询文章数量不大于100), 'score1.xlsx'(待保存数据的文件,需要和下面的一致))
    for index in range(1,t_index): #文章总数
        filename = "score"+str(index)+".xlsx"
        exporter = CSDNArticleExporter("Medlar_CN", index, 100, filename)  # Replace with your username
        exporter.export_to_excel()
        # 批量获取质量分
        score = ArticleScores(filename)
        score.write_scores_to_excel()

参数解释:

total = 212 #已发文章总数量
t_index = math.ceil(total/100)+1 #向上取整,半闭半开区间,开区间+1。
获取文章信息
CSDNArticleExporter(“待查询用户名”, 2(分页数量,按总文章数量/100所得的分页数),总文章数量仅为设置为全部可见的文章总数。
100(最大单次查询文章数量不大于100), ‘score1.xlsx’(待保存数据的文件,需要和下面的一致))

输出文件示例

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

打酱油的工程师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值