猫眼电影评论
Ryan Bellgardt’s 2018 movie, The Jurassic Games, tells the story of ten death row inmates who must compete for survival in a virtual reality game where they not only fight each other but must also fight dinosaurs which can kill them both in the game and for real. Starring mostly B-list Hollywood actors such as Perrey Reeves and Ryan Merriman, the movie clearly sets of all the alarms of a low-budget flick. Nevertheless, most critics thought it was a very good effort: Rotten Tomatoes considered it “fresh”, giving it a rare rating of 83%. Writing on the same website, Sam Kurd of Cultured Vultures felt that the movie, “while not original or ground-breaking, [was] a lot of fun and worth-watching”. Other critics on the same platform rated it nicely, so that the movie ended up with an average rating of 7.2 out of 10. However, if Hollywood, or critics for that matter, expected that regular movie-goers would love the movie, they must have thought quite wrong.
瑞安·贝尔加特(Ryan Bellgardt)的2018年电影《侏罗纪游戏》讲述了十名死囚囚犯的故事,他们必须在虚拟现实游戏中为生存而竞争,他们不仅要互相战斗而且还必须与可能在游戏中和真实世界中杀死他们的恐龙战斗。 这部电影主要由好莱坞名人B演员(例如Perrey Reeves和Ryan Merriman)主演,这部电影显然集结了低预算电影的所有警报。 但是,大多数批评家认为这是一个很好的尝试:烂番茄认为它是“新鲜的”,罕见的评级为83%。 《文化秃鹰》的萨姆·库尔德在同一个网站上写道,这部电影“虽然不是原创电影,也不是开创性的,但它充满了乐趣,值得一看”。 在同一平台上的其他评论家都对其进行了很好的评分,因此该电影的平均评分为7.2(满分10分)。但是,如果好莱坞或对此有评论的人们期望普通电影观众会喜欢这部电影,认为很不对。
On IMDb, it ended up with an overall rating of 3.8 out of 10 after over 2,000 votes. The chatter is that, though the movie tells a pretty fun story, its special effects are horrendous. For a B-movie that possibly could not afford top-rated Hollywood CGI, it would seem understandable that the directors should be given a pass. Unfortunately, the IMDb crowd was not so forgiving. “Low-budget movie”, “sloppy characters”, “low grade CGI” are some of the words thrown about in the reviews on that website. It seems, what the critics saw past, the audience could not.
在IMDb上,经过2,000票以上的投票,它的总体得分为3.8,满分为10。 有趣的是,尽管这部电影讲述了一个有趣的故事,但其特殊效果令人震惊。 对于可能无法负担顶级好莱坞CGI的B级电影,应该给导演通行证是可以理解的。 不幸的是,IMDb人群并没有那么宽容。 “低预算电影”,“草率角色”,“低级CGI”是该网站评论中提到的一些词语。 看来,评论家过去所看到的,观众却看不到。
While critics and crowd may have disagreed over The Jurassic Games, they do agree on a handful other movies such as Aaron Schneider’s Greyhound, and Mark Lamprell’s Never Too Late, for example, both scored with high ratings on IMDb and Rotten Tomatoes. These contrasting situations put a question before us. Should we often have disagreements, or agreements, when critics and crowd score or review a movie?
尽管评论家和观众可能对《侏罗纪奥运会》持不同意见,但他们确实同意了其他几部电影,例如亚伦·施耐德的《灵缇犬》和马克·兰普雷尔的《永不晚》,它们在IMDb和烂番茄上均获得了很高的评分。 这些相反的情况向我们提出了一个问题。 评论家和观众评分或观看电影时,我们是否应该经常有分歧或协议?
This question surrounding the truth value of crowds is not a recent one. On a spring morning in 1906, Frank Galton, an English statistician and polymath, attended a weight-judging competition at an annual exhibition of the West of England Fat Stock and Poultry at Plymouth. This was a farmers’ fair where all sorts of crop and animal products were on display and sold. A fat ox had been selected for slaughter, and participants were provided a card on which to write their names, addresses and estimates of what the ox would weigh after it is slaughtered and “dressed”. Those with successful guesses would receive a prize. While most may have considered their participation trivial and of no consequence, Galton thought the combined results would make for a good experiment. He collated the results and ran statistical analysis on them. He found that the “middlemost” estimate was very close to the actual weight of the slaughtered ox: it was correct to within 1% of the actual value. While the estimate was 1207-lb, the actual weight of the dressed ox was 1198-lb. In effect, while most of the participants in the guessing competition may have guessed wrongly, their combined effort produced a result close enough to the actual value.
这个关于人群真实价值的问题并不是最近才提出的。 1906年的一个Spring早晨,英国统计学家和数学家弗兰克·加尔顿(Frank Galton)在普利茅斯(Plymouth)举行的英格兰西部脂肪和家禽年度展览上参加了一次重量比赛。 这是一个农民博览会,展出并出售各种农作物和动物产品。 选择了一只肥牛进行屠宰,并为参与者提供了一张卡片,上面写着他们的名字,地址和对牛被宰杀和“穿衣”后体重的估计。 那些猜测成功的人将获得奖励。 尽管大多数人可能认为他们的参与微不足道,并且没有任何后果,但高尔顿认为合并的结果将有助于进行良好的实验。 他整理了结果并对其进行了统计分析。 他发现“最中间”的估计值与屠宰牛的实际重量非常接近:正确的是在实际值的1%以内。 虽然估计的重量为1207磅,但穿戴过的牛的实际重量为1198磅。 实际上,尽管大多数猜谜比赛的参与者可能猜错了,但他们的共同努力产生了接近实际价值的结果。
Though crowd behavior can sometimes be fickle or irrational, in certain cases, such as with Galton’s experiment, it provides interesting global estimates. In some situations, a diverse and independently sampled opinion of a select crowd could in fact reflect the “truth”. This logic has been successfully exploited in election polls, internet search engines, stock market predictions, and online knowledge repositories such as Wikipedia. Recently, we concluded a project where we examined this theory in relation to movie ratings.
尽管人群的行为有时可能是善变的或不合理的,但在某些情况下(例如通过高尔顿的实验),它提供了有趣的全局估计。 在某些情况下,特定人群的多样化且独立采样的意见实际上可能反映出“真相”。 这种逻辑已在选举民意测验,互联网搜索引擎,股市预测以及诸如Wikipedia之类的在线知识库中得到了成功利用。 最近,我们完成了一个项目,在该项目中我们研究了与电影分级有关的这一理论。
IMDb and Rotten Tomatoes are some of the biggest movie aggregators online. Both collect ratings and other details on movies and TV shows, making these accessible to their global audience. While the former collects its movie ratings mainly from the crowd, the latter uses a score based strictly on the opinion of critics in the movie industry. These two contrasting techniques of judging a movie pits the crowd against critics and makes for an interesting comparison of the two opinions. Would the “wisdom of crowds” produce a rating for a movie just as good as that from seasoned experts? We examined the data to see what insights are present.
IMDb和Rotten Tomatoes是在线上最大的电影聚合商之一。 两者都收集电影和电视节目中的收视率和其他详细信息,从而使全球观众都可以访问。 前者主要从人群中收集电影收视率,而后者则使用严格基于电影业评论家意见的得分。 这两种评判电影的对比技术使观众与评论家相提并论,对这两种观点进行了有趣的比较。 “人群的智慧”是否会对电影产生与资深专家相同的评价? 我们检查了数据以查看存在哪些见解。
We collected 44,000 movies from IMDb and 9,638 movies from Rotten Tomatoes, identifying 3,100 unique intersections from both sets. Using this data, we found a few revealing information. There exists a strong positive correlation between movie ratings on Rotten Tomatoes and on IMDb. Perhaps this is unsurprising. Most movies with high ratings on Rotten Tomatoes should also have high ratings on IMDb, even if the ratings are not the same overall. Good movies are good movies, in any case. However, we found that, on average, critics and crowd do not agree all the time.
我们从IMDb收集了44,000部电影,从Rotten Tomatoes收集了9,638部电影,确定了两组中的3,100个独特交集。 使用这些数据,我们发现了一些具有启发性的信息。 烂番茄和IMDb的电影评分之间存在很强的正相关关系。 也许这并不奇怪。 即使在整体上评分不同,大多数在烂番茄上获得高收视率的电影也应在IMDb上获得高收视率。 好的电影无论如何都是好电影。 但是,我们发现,平均而言,批评家和群众不同意。
We scaled the movie ratings on both sites, then divided their difference into three bins. This is a little like the approach Jules Wanderer used in his paper, “In Defense of Popular Taste: Film Ratings among Professionals and Lay Audiences”. We defined a spread value, which is the tolerance we can allow in the difference between movie ratings, so that, for example, if critics rate a movie 0.8 and the crowd rate the same movie 0.75, we say that both the critics and the crowd agree to within 0.05 of a movie’s ratings. We find, as expected, that the agreement between these two depends substantially on the spread value. When we allow no more than 0.1 in spread, both sides agree only on 28% of the movies in the data set. This is quite low. In addition, it appears there has never really been consensus between critics and crowd over the years when we are strict with our tolerance or spread. We found that it is less likely that the movie ratings provided by critics and crowd are within 0.1 of each other. If anything, it is more probable that the Tomatometer Score of a movie is lower than its IMDb score. In effect, while critics appear to be penalizing certain movies by providing them lower scores, the crowd seems to give these same movies a higher rating.
我们对两个站点上的电影收视率进行了缩放,然后将它们的差异分为三个部分。 这有点像朱尔斯·万德(Jules Wanderer)在其论文《捍卫大众品味:专业人士和非专业观众的电影收视率》中所采用的方法。 我们定义了一个传播值,这是我们可以允许的电影评分之间的差异的容忍度,因此,例如,如果评论家对电影评分为0.8,而人群对同一电影评分为0.75,那么我们说评论者和人群同意在电影收视率的0.05以内。 正如我们所料,我们发现这两者之间的协议很大程度上取决于点差值。 如果我们允许的传播不超过0.1,则双方仅同意数据集中28%的电影。 这是相当低的。 此外,多年来,在我们严格容忍或传播时,评论家和人群之间似乎从未真正达成共识。 我们发现,评论家和观众提供的电影收视率彼此之间的误差不太可能在0.1以内。 如果有的话,电影的“番茄计分”很可能低于其IMDb分数。 实际上,尽管评论家似乎通过给某些电影较低的分数来对它们进行惩罚,但观众似乎给这些相同的电影更高的评分。
Unlike us, Wanderer, in his paper which examined to what degree professional critics agree with lay movie-goers, found a much higher score of 53% out of 5,644 instances as the fraction of movies on which both sides agreed. We put this difference down to the lay audience examined in these two cases. Wanderer examined an audience of Customer Union members who were more likely to belong to the upper-middle class in America. These were members of a social circle with a median income of around $12,800, compared with the average US family income of about $7,400 at that time. Our audience, who are IMDb users, is more likely to belong in the larger group with the lower median income. Therefore, while Wanderer puts his audience in the same social class as the critics, we think our audience may be in a lower social class.
与我们不同的是,流浪者在其论文中对专业评论家与外行电影观众的认同程度进行了调查,结果发现,在双方共同意的电影比例中,有5,644个实例中53%的得分要高得多。 我们将此差异归结为在这两种情况下检查的非专业观众。 Wanderer对客户联盟成员的受众进行了调查,他们更可能属于美国的中上阶层。 这些人是一个社交圈子的成员,中位收入约为12800美元,而当时美国的平均家庭收入约为7400美元。 我们的受众是IMDb用户,他们更有可能属于中位数收入较低的较大群体。 因此,尽管流浪者将听众与评论家置于同一社会阶层,但我们认为听众可能处于较低的社会阶层。
This could explain the additional differences we observed in subsequent analyses of the data. For example, we found that while the crowd is more likely to rate a movie higher when it features a top actor, critics seem unbothered. Similarly, when a movie is directed by a top director, it looks like critics are more in favor of such movies than the crowd is. The boxplots below show these details.
这可以解释我们在后续数据分析中观察到的其他差异。 例如,我们发现,虽然当演员扮演男主角时,人群对电影的评价更高,但评论家似乎毫不犹豫。 同样,当电影由高层导演执导时,评论家似乎比观众更喜欢这种电影。 下面的方框图显示了这些详细信息。
As an added step, we examined if we could predict movie ratings offered by the crowd given what we know about the movie and its ratings from critics. Here, we go from the critics’ mind to the crowd’s. While this is not entirely related to our subject of discussion, it makes for an interesting experiment. Using an array of machine learning tools, we obtained a decent mean squared error value of 0.37. The resulting model allows us to formulate a mathematical relationship between a movie’s attributes on Rotten Tomatoes and its IMDb score. The movie’s Tomatometer rating and runtime are some of the most significant predictors.
作为附加的步骤,考虑到我们对电影及其评论家的了解,我们检查了是否可以预测人群提供的电影收视率。 在这里,我们从批评家的思想转向群众的思想。 尽管这与我们的讨论主题并不完全相关,但却可以进行有趣的实验。 使用一系列机器学习工具,我们获得了0.37的体面均方误差值。 由此产生的模型使我们能够在烂番茄上的电影属性与其IMDb得分之间建立数学关系。 电影的Tomatometer评分和运行时间是一些最重要的预测指标。
In conclusion, though we could establish a strong correlation between ratings on Rotten Tomatoes and IMDb, we found no substantial agreement between the ratings offered by critics and crowd. In this case, we are unable to learn from the “wisdom of the crowd”. Perhaps if this crowd were a more select group of ardent movie-goers, a property we can’t claim for IMDb users, we may have seen a difference in the outcome of our analysis and more similarity between both ratings. We suspect that differing bias on both sides and an audience more diverse than focused may have skewed the ratings such that there is no evident consensus. Galton’s audience, after all, may have all been farmers or understood farming, else why would they be lurking at a farmers’ fair?
总之,尽管我们可以在烂番茄和IMDb的评级之间建立强相关性,但我们发现批评家和人群提供的评级之间没有实质性的共识。 在这种情况下,我们无法向“人群的智慧”学习。 也许,如果这些人群是一群热忱的电影爱好者,我们无法为IMDb用户声称这是一个财产,那么我们的分析结果可能会有所不同,而且两个评级之间的相似性更高。 我们怀疑双方的不同偏见和听众比重点更多样化可能会歪曲收视率,从而导致没有明显的共识。 毕竟,高尔顿的听众可能都是农民,或者是农民,他们为什么会潜伏在农民博览会上?
猫眼电影评论