Python生成个人CSDN历史博客文章列表及目录_使用python爬取csdn历史博客文章列表,并生成目录-CSDN博客

本文链接：https://blog.csdn.net/qq_40985985/article/details/128675971

这篇博客将介绍如何使用Python构建CSDN历史博客文章列表，并生成目录。

2020年

2020年04月

2020年05月

2020年06月

2020年07月

2020年08月

2020年09月

2020年10月

2020年11月

2020年12月

2021年

2021年01月

2021年02月

2021年03月

2021年04月

2021年05月

2021年06月

2021年07月

2021年08月

2021年09月

2021年10月

2021年12月

回顾2021，展望2022

2022年

2022年03月

使用Python,OpenCV生成Aruco标记

2022年04月

2022年05月

2022年06月

2022年07月

2022年08月

2022年09月

2022年10月

2022年11月

2022年12月

2023年

2023年01月

2023年02月

2023年03月

2023年04月

2023年05月

2023年06月

2023年07月

2023年08月

2023年09月

2024

2024年01月

2024年02月

2024年03月

2. 源码

pip install pyfreeproxy

2023/4/4 更新，之前的代理不太行，无法访问了，切换到freeproxy

# 2023/4/4 更新，之前的代理不太行，无法访问了，切换到freeproxy 
# 使用Python爬取CSDN历史博客文章列表，并生成目录
# python pa_article.py

# 2022
## 202201
# - aaa
# - bbb
## 202202
# -ccc
# -ddd
# 2023
## 202301
# -eee
# -fff
import datetime
import json

import requests

def getCSDNTitleUrl(year, month, dict):
    now_time = datetime.datetime.now().strftime("%Y%m")
    if (year + month > now_time):
        return
    url = 'https://blog.csdn.net/community/home-api/v1/get-business-list?page=1&size=50&businessType=blog&orderby=&noMore=false&year=' + year + '&month=' + month + '&username=qq_40985985'
    headers = {
        'User-Agent':
            'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'
    }
    response = requests.get(url, headers=headers)
    # print(response.text)

    results = json.loads(response.text)
    dict[year + '年' + month + '月'] = results

dict = {}
for i in range(2023, 2024):
    for j in range(1, 13):
        if (j < 10):
            # print(i, '0' + str(j))
            getCSDNTitleUrl(str(i), '0' + str(j), dict)
        else:
            # print(i, str(j))
            getCSDNTitleUrl(str(i), str(j), dict)
         

list = []
for item in dict.items():
    key = item[0]
    value = item[1]
    # print('%s   %s:%s' % (item, key, value))

    data = value['data']['list']
    if (len(data) == 0): continue
    if ('01' in key):
        print('\n# {}\n'.format(key[0:4]))
    print('\n## {}\n'.format(key))
    for obj in data:
        print('- [{}]({})'.format(obj['title'].replace('[', '').replace(']', ''), obj['url']))