这次我们来抓取 Marvel 电影清单
网址是
https://www.imdb.com/list/ls071217506/
相关详细代码可以在我的GitHub地址获取
https://github.com/liuzuoping/PythonSpyder_100_examples
请求网页数据
import requests
from bs4 import BeautifulSoup
res = requests.get('https://www.imdb.com/list/ls071217506/')
soup = BeautifulSoup(res.text, 'lxml')
movies = []
for movie in soup.select('.lister-item-header a'):
movies.append(movie.get('href').split('/')[2])
查看部分结果
movies[0:3]
[‘tt0371746’, ‘tt0800080’, ‘tt1228705’]
抓取IMDB评论数据
def getMovieReviews(movieid):
reviews = []
res = requests.get('https://www.imdb.com/title/tt0371746/reviews?spoiler=hide&sort=helpfulnessScore&dir=desc&ratingFilter=10')
soup = BeautifulSoup(res.text, 'lxml')
for review in soup.select('.imdb-user-review'):
star = soup.select_one('.rating-other-user-rating span')
if star and star.text == '10':
title = review.select_one('.title').text
author = review.select_one('.display-name-link').text
dt = review.select_one('.review-date').text
content= review.select_one('.content .text').text
reviews.append({'title':title, 'author':author, 'dt':dt, 'content':content})
return reviews
totalreviews = []
for movieid in movies:
print(movieid)
reviews = getMovieReviews(movieid)
totalreviews.extend(reviews)
整理评论数据
import pandas
df = pandas.DataFrame(totalreviews)
df.count()
df.head()
df['content'][0]
‘Rest assured, Iron Man is an absolutely amazing movie. I won’t dare spoil any of this remarkable movie for you but I do recommend it as highly as I possibly can. Marvel needed to get in to the solo movie making business long ago. Instead of leasing out their characters to other studios, they’re making movies themselves. Most everyone knows Iron Man is their first effort and what a great lead off film! This movie helps take the comic book genre to the highest level. Just like they did in the books, they reinvent standard epic adventure by “Marvelizing” characters and making them more believable. The Spider-Man and the X-Men movies did this to a degree but only as far as their respective studios wished to stay true to the source material. Anything added or amended was for the benefit of the live action adaptation. Director Sam Raimi pulled this off by talking to the summer crowd, not down to them with the Spider-Man series. Jon Favreau has done the same thing here but I think he’s done it even better. Raimi intentionally threw in a little cheese. Favreau adds nice bits of humor but not too much. He also grounds the action and the suit of armor in firm reality. I’ve said it before but it’s brave to reach for the highest common denominator with a big budget film and Favreau delivers a movie with as much feeling as it has action and intensity. Needless to say, Robert Downey Jr. and company deliver the goods. It’s a movie that has a wonderful balance that delivers intelligence with its fun.The amazing yet realistic action is paced by the plot and characters that keep you interested from start to finish. What absolutely blew me away were the phenomenal special effects. I know they built a practical, working armor. What I loved is the use of CGI was used to augment the real life armor and not create something from scratch. Most all CGI constructs feel fake somehow but the stuff in Iron Man didn’t seem fake even for an instant. As great as everything looked, what really drives the movie is the emotional resonance and down to earth nature of the plot. Sure the concept is wild but it’s all presented so that you really believe it could happen. I doubt anyone will find fault with this movie unless they went in trying to dislike it.This is, without a doubt going to be one of my top 10 movies of 2008, quite possibly the number one film.’
储存评论数据
df.to_csv('movie_review.csv', encoding='utf-8-sig')