Join me in an experiment. We’re going to search for various times of day using Google’s Image Search. We’ll use a fresh Google Chrome Incognito window to ensure our results aren’t skewed. This is scientific, after all, and we want the most accurate results possible.

ĴOIN我在实验中。 我们将使用Google的图片搜索来搜索一天中的不同时间。 我们将使用新的Google Chrome Incognito窗口来确保结果不偏斜。 毕竟,这是科学的,我们希望获得最准确的结果。

First, let’s try “sunrise.”


Screenshots by the author

Well done, Google! I’m proud of you. Those are definitely photos of the sunrise. Now let’s try “midday.”

做得好,Google! 我以你为荣。 那些绝对是日出的照片。 现在,让我们尝试“中午”。

A bit more variety in these results but still totally fine. Google, you’re doing quite well. I’m impressed. How about “dusk?”

这些结果的多样性更多,但仍然完全可以。 Google,您做得很好。 我印象深刻 “黄昏”怎么样?

Perfect. Maybe Google’s all-knowing algorithms really do surface exactly what you’re looking for every single time. It’s an incredible engine, and it’s responsible for catapulting a late 1990s startup to a $1 trillion market value in 2020. Search is still vital to Google, so it’s important that it gets things right.

完善。 可Bé谷歌的无所不知的算法确实表面你在寻找什么,每一次。 这是一个令人难以置信的引擎,它负责推动1990年代末的一家初创公司在2020年达到1万亿美元的市值 。 搜索对Google仍然至关重要,因此正确处理搜索很重要。

Just for kicks, and because I know this is going to return the exact result I want, let’s try one more: “twilight.”


Image for post

Oh… nevermind.


(令人惊讶的是人类)算法 (The (surprisingly human) algorithm)

To understand how pasty vampires and buff werewolves invaded Google Images, you have to first understand how Google’s algorithms work. That’s a challenge, as it’s a notoriously opaque machine and Google itself is often vague in its descriptions of how and why its engine returns the results it does. Offering a detailed description of exactly how your world-ruling search engine functions isn’t good for business, so it’s no surprise Google doesn’t reveal much.

要了解糊状吸血鬼和迷们狼人如何入侵Google图片,您必须首先了解Google的算法是如何工作的。 这是一个挑战,因为它是一台臭名昭著的不透明机器,而Google本身对于其引擎如何以及为何返回其结果的描述通常含糊不清。 详细说明您的全球搜索引擎功能如何对业务不利,因此Google不会透露太多信息也就不足为奇了。

Via Google, emphasis mine:

通过Google ,重点是:

To give you the most useful information, Search algorithms look at many factors, including the words of your query, relevance and usability of pages, expertise of sources, and your location and settings. The weight applied to each factor varies depending on the nature of your query — for example, the freshness of the content plays a bigger role in answering queries about current news topics than it does about dictionary definitions.

为了给您最有用的信息,搜索算法会考虑许多因素,包括查询的词,页面的相关性和可用性,来源的专业知识以及您的位置和设置。 应用于每个因素的权重因查询的性质而异-例如, 内容的新鲜度在回答有关当前新闻主题的查询中比在词典定义中扮演更大的角色。

The machine doesn’t care that “twilight” is a time of day. Instead, it delivers what it thinks is most relevant. Fair enough, but the algorithm is just one piece of the puzzle, and there’s a human element here that can’t be overlooked. Google hires external contractors to verify that the results of various searches meet the criteria it sets forth, including the aforementioned “freshness” metric.

机器不在乎“暮光之城”是一天中的某个时间。 相反,它提供了它认为最相关的内容。 足够公平,但是算法只是难题的一部分,而且这里有一个不可忽视的人为因素。 Google聘请外部承包商来验证各种搜索结果是否符合其规定的标准,包括上述“新鲜度”指标。

To help ensure Search algorithms meet high standards of relevance and quality, we have a rigorous process that involves both live tests and thousands of trained external Search Quality Raters from around the world. These Quality Raters follow strict guidelines that define our goals for Search algorithms and are publicly available for anyone to see.

为了帮助确保搜索算法符合相关性和质量的高标准,我们进行了严格的流程,其中涉及实时测试和来自世界各地的数千名经过培训的外部搜索质量评定员。 这些质量评分员遵循严格的准则,这些准则定义了我们对搜索算法的目标,并且可供所有人公开查看。

Want to get a better idea of how these human Google search raters decide what results are good, okay, or poor? Grab a cup of coffee. The company’s “General Guidelines” that these unseen search soothsayers are supposed to follow is available for everyone to read. It’s 168 pages long and incredibly complex, but ultimately there’s no escaping the fact that it all comes down to a judgment call.

是否想更好地了解这些人工Google搜索评分员如何确定哪些结果是好,好还是差? 拿一杯咖啡。 这些看不见的搜索占卜者应该遵循的公司“一般准则”可供所有人阅读。 这本长达168页的书非常复杂,但是最终却不可避免地要归结为判决书。

什么新鲜的? (What’s fresh?)

As for Google’s “freshness” concept, the company’s most recent explanation of what freshness means is from 2011. Not exactly what one would consider fresh, but it does offer us some insight into why Robert Pattinson shows up when you search for an atmospheric phenomenon.

至于Google的“新鲜”概念,该公司对新鲜感的最新解释是从2011年开始 。 并不是人们会认为新鲜的东西,但是它确实为我们提供了一些洞察力,以使罗伯特·帕丁森(Robert Pattinson)为什么在寻找大气现象时会出现。

Google’s blog post explaining freshness highlights “recent events or hot topics” as being one measure of freshness. Twilight (the book) came out in 2005, followed by the first Twilight film in 2008. They’re old, but the book anthology and film series were so incredibly popular at the time that they’ve seemingly permanently skewed how Google’s algorithm and perhaps even its human testers decide what someone searching for “twilight” is actually looking for.

Google的博客文章解释了新鲜度,强调了“最近发生的事件或热门话题”是衡量新鲜度的一种方法。 《暮光之城》 (该书)于2005年问世,随后是2008年的第一部《 暮光之城》电影。它们很古老,但该书选集和电影系列在当时非常受欢迎,以至于它们似乎永久性地歪曲了Google的算法,也许甚至其人类测试人员也可以确定搜索“暮光之城”的人的实际需求。

Historical search patterns for ‘twilight.’ Source: Google Trends
“暮光之城”的历史搜索模式。 资料来源:Google趋势

Searches for “twilight” before the launch of the books and movies were quite low, but even years after these vampire stories fell off in popularity and started to collect digital dust, “twilight” is still a slightly more popular search term than it was in the years prior. But Google also allows us to separate regular search from image search, and that’s where things get really interesting.

在书籍和电影问世之前,“暮光之城”的搜索量仍然很低,但是即使在这些吸血鬼故事逐渐消失并开始收集数字尘埃数年之后,“暮光之城”仍然是一个比其流行的搜索词前几年。 但是Google还允许我们将常规搜索与图片搜索分开,这使得事情变得非常有趣。

Historical image search patterns for ‘twilight.’ Source: Google Trends
“暮光之城”的历史图像搜索模式。 资料来源:Google趋势

Unfortunately, the records only go back to 2008 for images instead of 2004 as with regular search, but we can still see the big Twilight spike and the taper. The difference here is that the taper is more dramatic, and the popularity has reached near zero in recent years.

不幸的是,这些记录只能返回到2008年获取图像,而不是像常规搜索那样返回2004年,但是我们仍然可以看到较大的Twilight峰值和锥度。 此处的区别在于锥度更加引人注目,并且近年来的普及率已接近于零。

Very few people are searching for the term “twilight” in images these days, and because the term was never strongly linked to images of actual twilight, Google believes that Stephanie Meyer’s fantasy world is the most accurate representation of what twilight is. This is further supported by the image search trends for “twilight movie,” which should give us a hint as to how many people are really hoping to see vampires when they type “twilight” into the search bar.

如今,很少有人在图像中搜索“暮光”一词,并且由于该词从未与实际的暮光图像紧密相关,因此Google认为斯蒂芬妮·迈耶的幻想世界是暮光是最准确的表示。 “暮光之城”的图像搜索趋势进一步支持了这一点,该趋势应该向我们暗示一下,当有多少人在搜索栏中键入“暮光之城”时,他们真的希望看到吸血鬼。

Historical image search patterns for “twilight movie.” Source: Google Trends
“暮光之城”的历史图像搜索模式。 资料来源:Google趋势

Virtually nobody is actually trying to find “Twilight movie” images anymore. In fact, it might not be a stretch to assume that when any given person types “twilight” into the image search field, the odds of what they’re actually looking for comes down to a coin flip. We can’t know for certain, but the fact remains that the popularity of the movie appears to have permanently altered what the search engine believes twilight is.

几乎没有人实际上试图再找到“ 暮光电影”图像。 实际上,假设当任何给定的人在图像搜索字段中键入“暮光之城”时,他们实际寻找的几率归因于硬币抛售,这也许不是一件容易的事。 我们无法确定,但是事实仍然是,电影的流行似乎永久改变了搜索引擎认为的曙光。

搜索的全食 (A total eclipse of the search)

We can use a different movie from the same fantasy series to further examine how Google prioritizes image search results. This time, we’ll look at the 2010 movie Eclipse, the third film in the franchise. It was even more popular than the first film, nearly doubling Twilight’s box office gross, but if you search for “eclipse” in Google Images, you’d never know it existed.

我们可以使用同一幻想系列的另一部电影来进一步研究Google如何确定图像搜索结果的优先级。 这次,我们将看2010系列电影Eclipse ,这是该系列电影中的第三部 它比第一部电影更受欢迎,几乎使《 暮光之城 》的总票房翻了一番,但如果您在Google图片中搜索“Eclipse”,就永远不会知道它的存在。

Where are all the vampires? Their fate can be explained by examining the Google image search trends.

所有的吸血鬼都在哪里? 他们的命运可以通过检查Google图片搜索趋势来解释。

Image for post
Historical search patterns for “eclipse.” Source: Google trends
“蚀”的历史搜索模式。 资料来源:Google趋势

The spike you see about one-quarter of the way into the graph is the 2010 release of the Eclipse movie. That massive dagger of a spike toward the end is the total solar eclipse of August 21, 2017. A huge swath of the United States was treated to a once-in-a-lifetime skywatching event on that date, and even if you weren’t in the path of the eclipse, you definitely heard and probably read about it.

您看到的图表峰值的四分之一是2010年的Eclipse电影。 接近尾声的巨大匕首就是2017年8月21日的日全食。那一天,美国境内发生了一次千载难逢的空中监视事件,即使您不是在Eclipse的过程中,您肯定听说过并且可能读过它。

The celestial event generated a lot more interest than the movie that shared its name. In fact, every time an eclipse happens in the United States or abroad it tends to generate news stories and interest on Google. In this case, the “real” eclipse wins the search war. Images of actual eclipses were clearly deemed more fresh and of higher quality, and Google now assumes that someone searching for “eclipse” wants to see the astronomical phenomenon rather than movie promo photos and posters.

与共享名字的电影相比,天体事件引起了更多的兴趣。 实际上,每次在美国或国外发生Eclipse时,都会在Google上引起新闻报道和兴趣。 在这种情况下,“真实”Eclipse赢得了搜索大战。 显然,Eclipse的图像被认为更新鲜,质量更高,并且Google现在假设正在搜索“Eclipse”的人希望看到天文现象,而不是电影宣传照片和海报。

The same is true for the second Twilight film, New Moon. In this case, despite the movie seemingly winning the search war with a huge spike of interest when it debuted, followed by a dramatic taper, people still regularly search for “new moon” for a variety of reasons not related to the decade-old movie. Keeping track of moon cycles is important to a lot of people, and the vampires were no match for that regular flow of fresh search traffic and news stories over years and years.

第二部《 暮光之城》电影《新月》也是如此。 在这种情况下,尽管电影看似在首次亮相时就赢得了巨大的兴趣 ,随后出现了戏剧性的锥度,从而赢得了搜索大战,但人们仍然出于各种原因,仍定期搜索“新月”,原因与十年前无关电影。 追踪月球周期对许多人来说很重要,而吸血鬼却无法满足多年来不断出现的新鲜搜索流量和新闻报道的规律。

“Twilight,” on the other hand, doesn’t benefit from the same interest. The movie was more popular than searches for actual twilight will likely ever be, and there’s no slow burn of interest in the scientific phenomenon (as is the case with “new moon”) to save it from its unfortunate fate. On top of that, people searching for information on any of the Twilight sequels will likely include the term “Twilight” in their query, adding strength to the connection between the word itself and the franchise. You could argue that actual twilight is fresher than the movies, since twilight happens twice a day, every day, as opposed to the film and book releases which are now practically antiques, but that doesn’t matter to Google’s electronic mind.

另一方面,“暮光之城”不会从相同的利益中受益。 这部电影比寻找真正的暮光之城更受青睐,并且对科学现象(如“新月”的情况)的兴趣没有丝毫缓和,以使它摆脱不幸的命运。 最重要的是,人们在任何暮光之城续集的搜索信息可能会包括在查询中的术语“暮光之城”,增加实力这个词本身和加盟店之间的连接。 您可能会说实际的黄昏比电影更新鲜,因为黄昏每天发生两次,这与现在实际上是古董的电影和书籍发行相反,但这与Google的电子思维无关。

Nevertheless, Google doesn’t know any better, so it surfaces what it thinks is best. In this case, it happens to be a movie that hasn’t been fresh in years, rather than the “dictionary definition” that Google admits its algorithm sometimes ignores.

不过,谷歌并没有更好的了解,因此它展示了自己认为最好的东西。 在这种情况下,这恰好是一部电影已经好几年没有上映了,而不是Google承认其算法有时会忽略的“字典定义”。

美好的未来 (A pale future)

Is twilight doomed? It may be too early to say, but things don’t look promising. It’s been 15 years since the book came out and 12 years since the movie debuted, and it’s still the default image search result.

暮光注定了吗? 现在说还为时过早,但是事情看起来并不乐观。 自该书问世以来已有15年,而电影首映以来已有12年了,它仍然是默认的图像搜索结果。

One major reason for this may be due to the fact that nobody searching for “twilight” has an obviously related search to lean on if they’re looking for photos of scenery instead of blindingly white vampires. If you wanted pictures from the movie but saw only sunsets when searching for “twilight” you’d simply try “twilight movie” instead.

一个主要的原因可能是由于这样一个事实,即没有人在寻找“暮光之城”,如果他们在寻找风景的照片而不是盲目地吸血的吸血鬼,则有明显相关的搜索依据。 如果您想从电影中拍摄照片,但是在搜索“暮光之城”时只能看到日落,则只需尝试“暮光之城”。

On the other hand, if you wanted photos of actual twilight but saw only brooding stares and sharp teeth in the search results, you might just throw your hands up and try a completely different search term, like “dawn,” for example. Google’s typically witty algorithm may have trouble connecting these dots, and its human testers — if they’ve ever been tasked with grading the results of twilight in image search — clearly aren’t helping matters.

另一方面,如果您想要真实的黄昏照片,但在搜索结果中只看到沉思的凝视和锋利的牙齿,则可以举起手来尝试一个完全不同的搜索词,例如“黎明”。 Google典型的机智算法可能很难连接这些点,而其人工测试人员(如果曾经负责对图像搜索中的黄昏结果进行分级的话)显然无济于事。

Google’s engine does get things right most of the time. But, like any machine, it’s not perfect. Sometimes the input doesn’t result in the expected output, and instead of a serene, relaxing photo of sunlight beaming over the horizon, you get Kristen Stewart with blood-red eyes.

Google的引擎在大多数情况下确实使事情变得正确。 但是,就像任何机器一样,它也不是完美的。 有时,输入不会产生预期的输出,而不是像宁静的,令人放松的阳光照在地平线上,而是让克里斯汀·斯图尔特(Kristen Stewart)鲜红的眼睛。

翻译自: https://onezero.medium.com/how-fictional-vampires-have-ruined-google-image-search-for-over-a-decade-81bd9ca14248






