抓取 Marvel 电影评论数据并保存本地

这次我们来抓取 Marvel 电影清单
网址是
https://www.imdb.com/list/ls071217506/
相关详细代码可以在我的GitHub地址获取
https://github.com/liuzuoping/PythonSpyder_100_examples

请求网页数据

import requests
from bs4 import BeautifulSoup
res = requests.get('https://www.imdb.com/list/ls071217506/')
soup = BeautifulSoup(res.text, 'lxml')
movies = []
for movie in soup.select('.lister-item-header a'):
    movies.append(movie.get('href').split('/')[2])

查看部分结果

movies[0:3]

[‘tt0371746’, ‘tt0800080’, ‘tt1228705’]

抓取IMDB评论数据

def getMovieReviews(movieid):
    reviews = []
    res = requests.get('https://www.imdb.com/title/tt0371746/reviews?spoiler=hide&sort=helpfulnessScore&dir=desc&ratingFilter=10')
    soup = BeautifulSoup(res.text, 'lxml')
    for review in soup.select('.imdb-user-review'):
        star = soup.select_one('.rating-other-user-rating span')
        if star and star.text == '10':
            title  = review.select_one('.title').text
            author = review.select_one('.display-name-link').text
            dt     = review.select_one('.review-date').text
            content= review.select_one('.content .text').text
            reviews.append({'title':title, 'author':author, 'dt':dt, 'content':content})
    return reviews
totalreviews = []
for movieid in movies:
    print(movieid)
    reviews = getMovieReviews(movieid)
    totalreviews.extend(reviews)

整理评论数据

import pandas
df = pandas.DataFrame(totalreviews)
df.count()
df.head()
df['content'][0]

在这里插入图片描述
在这里插入图片描述

‘Rest assured, Iron Man is an absolutely amazing movie. I won’t dare spoil any of this remarkable movie for you but I do recommend it as highly as I possibly can. Marvel needed to get in to the solo movie making business long ago. Instead of leasing out their characters to other studios, they’re making movies themselves. Most everyone knows Iron Man is their first effort and what a great lead off film! This movie helps take the comic book genre to the highest level. Just like they did in the books, they reinvent standard epic adventure by “Marvelizing” characters and making them more believable. The Spider-Man and the X-Men movies did this to a degree but only as far as their respective studios wished to stay true to the source material. Anything added or amended was for the benefit of the live action adaptation. Director Sam Raimi pulled this off by talking to the summer crowd, not down to them with the Spider-Man series. Jon Favreau has done the same thing here but I think he’s done it even better. Raimi intentionally threw in a little cheese. Favreau adds nice bits of humor but not too much. He also grounds the action and the suit of armor in firm reality. I’ve said it before but it’s brave to reach for the highest common denominator with a big budget film and Favreau delivers a movie with as much feeling as it has action and intensity. Needless to say, Robert Downey Jr. and company deliver the goods. It’s a movie that has a wonderful balance that delivers intelligence with its fun.The amazing yet realistic action is paced by the plot and characters that keep you interested from start to finish. What absolutely blew me away were the phenomenal special effects. I know they built a practical, working armor. What I loved is the use of CGI was used to augment the real life armor and not create something from scratch. Most all CGI constructs feel fake somehow but the stuff in Iron Man didn’t seem fake even for an instant. As great as everything looked, what really drives the movie is the emotional resonance and down to earth nature of the plot. Sure the concept is wild but it’s all presented so that you really believe it could happen. I doubt anyone will find fault with this movie unless they went in trying to dislike it.This is, without a doubt going to be one of my top 10 movies of 2008, quite possibly the number one film.’

储存评论数据

df.to_csv('movie_review.csv', encoding='utf-8-sig')

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值