豆瓣爬虫《消失的她》-CSDN博客

本文链接：https://blog.csdn.net/2302_77642040/article/details/137645644

该篇文章详细介绍了如何使用Python的requests和BeautifulSoup库爬取豆瓣电影《消失的她》的评论，包括用户、评分、评论时间和地点等信息。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

#code:utf-8
import requests
from bs4 import BeautifulSoup
import time

# 如果想多爬几页可以将16修改为更大的偶数
for i in range(2,20,2):
    url = 'https://movie.douban.com/subject/35660795/comments?start={}0&limit=20&status=P&sort=new_score'.format(i)
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15'
    }

    # 请求
    r=requests.get(url, headers=headers)

    # 查看状态码
    print(r.status_code)

    # 获取标题
    html = BeautifulSoup(r.text,"html.parser")
    title = html.find("h1").text

    # 获取用户名、评论、评分
    divs = html.find_all("div", class_ = "comment")

    s = {"力荐":"*****","推荐":"****","还行":"***","较差":"**","很差":"*"}

    with open("{}.txt".format(title),"w+",encoding="utf-8") as f:
        f.write(str(["用户", "评分","data_time", "location","user_num","内容"]))

        for div in divs:
            print("---------------------------------")
            name = div.find("a", class_="").text
            print("用户名：",name)

            content = div.find("span", class_="short").text
            print("用户评论：",content)
            data_time = div.find("span", class_="comment-time")["title"]
            print("评论时间：",data_time)
            
            location = div.find("span", class_="comment-location").text
            print("评论地点：",location)
            
            
            user_num = div.find("span", class_="votes vote-count").text
            print("有用人数：",user_num)
            
            score = None
            for i in range(1,6):
                try:
                    score = s[div.find("span", class_="allstar{}0 rating".format(i))["title"]]
                except:
                    continue

            if score == None:
                score = "用户未评分"

            print("评分：",score)
            print("[+]...{}的评论已爬取".format(name))
            f.write("\n")
            f.write(str([name,score,data_time,location,user_num,content]))

        f.close()

以上是电影《消失的她》爬虫全部代码