python爬虫实战：抓取豆瓣电影 Top250数据

最新推荐文章于 2025-04-06 20:45:04 发布

JH_vision

最新推荐文章于 2025-04-06 20:45:04 发布

阅读量2.1k

点赞数 3

分类专栏： Python 文章标签：爬虫

本文链接：https://blog.csdn.net/JH_joker/article/details/142948503

版权

1.1 分析网页结构

首先，打开豆瓣电影 Top250 页面，并通过浏览器的开发者工具（F12）观察网页的结构。每部电影的标题和评分位于特定的 HTML 标签内，可以通过这些标签来提取数据。

1.2 编写爬虫

import requests
from bs4 import BeautifulSoup

def get_movies(url):
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'lxml')
    
    movies = []
    
    for item in soup.find_all('div', class_='item'):
        title = item.find('span', class_='title').text
        rating = item.find('span', class_='rating_num').text
        movies.append({'title': title, 'rating': rating})
    
    return movies

url = 'https://movie.douban.com/top250'
movies = get_movies(url)

# 输出爬取的电影信息
for movie in movies:
    print(movie)