Python实现爬取微信公众号文章

是叶子耶

于 2024-08-24 20:42:08 发布

阅读量249

点赞数 3

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/agvx58074/article/details/141504674

版权

Python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

import requests
from bs4 import BeautifulSoup

def fetch_article(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
        'Referer': url
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        return response.text
    else:
        return None

def parse_article(html):
    soup = BeautifulSoup(html, 'html.parser')
    content = soup.find('div', class_='rich_media_content').get_text()
    # Browsing count and comments extraction would be more complex and may need specific HTML structure inspection
    return content

def main():
    url = 'https://mp.weixin.qq.com/s/I3hVi4znBgunzqDGLGLg_w'
    html = fetch_article(url)
    if html:
        content = parse_article(html)
        print(content)
    else:
        print("Failed to retrieve article.")

if __name__ == '__main__':
    main()