锁 公平 非公平_推荐引擎也需要公平!

锁 公平 非公平

As we turn to digital sources for news and entertainment, recommendation engines are increasingly influencing the daily experience of life, especially in a world where folks are encouraged to stay indoors. These systems are not just responsible for suggesting what we read or watch for fun, but also for doling out news and political content, and for surfacing potential connections with other people online. When we talk about bias in AI systems, we often read about unintentional discrimination in ways that apply only to simple binary classifiers (e.g. in the question “Should we let this prisoner out on parole?”, there are only two potential predictions: yes, or no). Thinking about mitigating bias in recommendation engines is much more complex. In this post, we’ll briefly describe how these systems work, then surface some examples of how they can go wrong, before offering suggestions on how to detect bias and improve your users’ experience online, in a fair and thoughtful way.

随着我们转向新闻和娱乐的数字资源,推荐引擎越来越影响人们的日常生活,尤其是在鼓励人们呆在室内的世界中。 这些系统不仅负责建议我们阅读或观看的娱乐内容,还负责分发新闻和政治内容,以及与网上其他人建立潜在的联系。 当我们谈论人工智能系统中的偏见时,我们经常以仅适用于简单的二进制分类器的方式来阅读无意识的歧视(例如,在问题“我们是否应该让这个囚犯接受假释吗?”中,只有两个潜在的预测:是的,或没有)。 考虑减轻推荐引擎中的偏差要复杂得多。 在这篇文章中,我们将简要描述这些系统的工作方式,然后列举一些有关它们可能会出错的示例,然后提供有关如何公平公正地考虑如何发现偏差并改善在线用户体验的建议。

第1部分:推荐系统的剖析 (Part 1: The Anatomy of a Recommender System)

If you’re coming to this article as someone who regularly builds or works on recommender systems, feel free to skip this part. For those of you needing a refresher or primer on the topic, read on!

如果您是经常在推荐系统上进行构建或工作的人,请随时跳过此部分。 对于那些需要在该主题上进行复习或入门的人,请继续阅读!

A GIF showing the tables that get filled in using estimates of whether a user will like a certain kind of content.

Recommender engines help companies predict what they think you’ll like to see. For Netflix, YouTube and other content providers, this might happen in the format of choosing which video cues next in auto-play. For a retailer like Amazon, it could be picking which items to suggest in a promotional email. At their core, recommender systems take as input two “sides” of a problem — users and items. In the case of Netflix, each user is an account, and each item is a movie. For Amazon, users are shoppers, and items are things you can buy. For YouTube, users are viewers, items are videos, and a third component are the users that create the content. You can imagine analogues with newspapers and other media sources such as the New York Times and the Wall Street Journal, music streaming services such as Spotify and Pandora, as well as social networking services such as Twitter and Facebook.

推荐引擎可帮助公司预测他们希望看到的东西。 对于Netflix,YouTube和其他内容提供商,这可能会以选择下一个自动播放的视频提示的格式发生。 对于像亚马逊这样的零售商,它可能会选择要在促销电子邮件中建议的商品。 推荐系统的核心是输入问题的两个“方面”(用户和项目)。 对于Netflix,每个用户都是一个帐户,每个项目都是一部电影。 对于亚马逊来说,用户是购物者,商品是您可以购买的东西。 对于YouTube,用户是观众,项目是视频,第三个组件是创建内容的用户。 您可以想象与报纸和其他媒体来源(例如《纽约时报》和《华尔街日报》),音乐流媒体服务(例如Spotify和Pandora)以及社交网络服务(例如Twitter和Facebook)的类似物。

Users rate some items, but not all of them. For example, even if you binge watch shows on Netflix, it’s unlikely that you have rated even a small fraction of Netflix’s vast content catalogue, much less so when it comes to YouTube’s library, where over 300 hours of content are uploaded every minute. A recommender system’s goal is, given a user, find the items or items that will be of greatest interest to that user, under the assumption that most items have not been rated by most users. How is this done? By learning from other, similar items, similar users, and combinations of the two.

用户对某些项目进行评分,但不是全部。 例如,即使您对Netflix上的观看节目进行狂欢,也不太可能对Netflix庞大的内容目录中的一小部分进行评分,更不用说对YouTube的图书馆进行评分,YouTube的图书馆每分钟上传300多个小时的内容。 在给定用户的情况下,推荐系统的目标是在大多数用户尚未对大多数项目进行评级的假设下,找到该用户最感兴趣的项目 。 怎么做? 通过从其他类似的项目,相似的用户以及两者的组合中学习。

Recommender systems recommend content based on inductive biases. One common inductive bias is that users who seem similar in the past will continue to seem similar in the present and future. In the context of recommender systems, this means that users who have, for example, rated videos similarly on YouTube in the past will probably rate videos similarly moving forward. Recommendations based on this intuition might try to find similar users to a particular user, and similar pieces of content to a particular piece of content, and then combine learnings from those two neighborhoods into an individual score for that particular pairing of a user and item. By doing this for every user-content pair, the recommender system can “fill in all the blanks”, that is, predict a rating for each combination of user and piece of content. After that, it is simply a matter of picking the most highly-rated pieces of content for that customer, and serving those up as you might see in a sidebar on YouTube or a “view next” carousel on Amazon Shopping.

推荐器系统基于归纳偏差推荐内容。 一种常见的归纳偏见是,过去看起来相似的用户在现在和将来都将继续相似。 在推荐系统的背景下,这意味着,例如,过去在YouTube上对视频进行过类似评分的用户可能会对前进的视频进行相似的评分。 基于这种直觉的建议可能会尝试找到与特定用户相似的用户,并向特定内容寻找相似的内容,然后将来自这两个邻域的学习组合为该用户和商品的特定配对的单个分数。 通过对每个用户内容对执行此操作,推荐系统可以“填补所有空白”,即预测用户和内容的每种组合的评级。 之后,仅需为该客户选择最受好评的内容,然后像在YouTube的侧边栏中或在Amazon Shopping的“下一个查看”轮播中看到的那样提供这些内容即可。

第2部分:哪里会出错? (Part 2: What Could Go Wrong?)

As we’ve discussed above, recommender engines attempt to “fill in the blanks” for a particular user by guessing at their level of interest in other topics when we only know how they feel about things they’ve already seen or read. Most recommender engines are a blend of “nearest neighbor” calculations and active rating elicitation, using a combination of supervised and unsupervised learning alongside deterministic rules that modify the selection process among the content that you could potentially recommend. To discuss some of the issues that often arise in recommender engine bias, we’ll look at a couple of examples from industry that illustrate the nuance and complexity involved.

正如我们上面所讨论的,当我们仅知道他们对已经看过或读过的东西的感觉时,推荐引擎会通过猜测他们对其他主题的兴趣程度来尝试“填补空白”。 大多数推荐器引擎结合了“最近邻居”计算和主动评分启发,结合了有监督和无监督学习以及确定性规则的组合,这些规则会修改可能会推荐的内容中的选择过程。 为了讨论在推荐器引擎偏见中经常出现的一些问题,我们将看几个行业示例,这些示例说明所涉及的细微差别和复杂性。

Popularity and the Gangnam Style Problem

人气与江南风格问题

One of the more common issues we see in industry can be illustrated by YouTube’s spectacularly named “Gangnam Style Problem”. The problem is this: no matter what content you recommend to your user, when one looks at the potential pathways they could take from one recommendation to the next, they all lead back to whatever happens to be the most popular video that day. While this may be good news for PSY and K-pop stans worldwide, gaining traction within a recommender engine can make or break the experience for someone creating content on these platforms, where they need their content to be seen in order to survive.

YouTube上引人注目的“ 江南风格问题 ”可以说明我们在行业中遇到的最常见问题之一。 问题是这样的:无论您向用户推荐什么内容,当一个人看着他们从一个推荐到另一个推荐可能采取的途径时,它们都会引回当天最流行的视频。 尽管这对于全球的PSY和K-pop电台来说可能是个好消息,但在推荐者引擎中获得牵引力可能会破坏在这些平台上创建内容的人们的体验,而在这些平台上,他们需要看到自己的内容才能生存。

More so every day, we hear about complaints from within the YouTube creator community, claiming that their channels suffer from this disparity, and that YouTube is biased against emerging artists. Thinking this through from a business perspective, it’s easy to see why this might be the case: YouTube wants to keep users on the page, and they’re more likely to do that if they can show you content that they know you’ll actually like. In fact, the less YouTube knows about how users will interact with your particular brand of content, the more risky it becomes to promote it.

每天,我们都在听到更多关于YouTube创作者社区的投诉 ,他们声称他们的频道受到这种悬殊的困扰,而且YouTube偏向新兴艺术家。 从业务角度考虑这一点,很容易理解为什么会这样:YouTube希望将用户留在页面上,如果他们可以向您显示他们知道您实际上会得到的内容,那么他​​们更有可能这样做喜欢。 实际上,YouTube对用户如何与您的特定品牌的内容了解得越少,推广它的风险就越大。

Image for post

One alternative that you’ll often see in industry is to combat this imbalance using explicit rules that promote newer, emerging content while also decreasing the likelihood that the “most popular” video will get recommended next. This gives a pre-set bonus to content producers to help them grow their audience, giving YouTube some time to learn more about both the quality of content and the nature of the users who interact with it. However, in doing so there is evidence that YouTube may be giving an unfair advantage to content that is more extreme or radical. This is unlikely to be intentional; AI does a very bad job of predicting things when it doesn’t have much data to go on. One thing is clear: if you aren’t watching for this problem by actively considering the newness of a creator when recommending content, you won’t know how severely your users and creators are affected.

在行业中,您经常会看到的一种替代方法是使用明确的规则来消除这种不平衡,这些规则可以推广新的,正在出现的内容,同时还可以降低“最受欢迎”视频接下来被推荐的可能性。 这为内容制作者提供了预设的奖励,以帮助他们扩大受众范围,从而使YouTube有时间了解更多内容的质量以及与之互动的用户的性质。 但是,这样做有证据表明YouTube可能会为更极端或激进的内容提供不公平的优势。 这不太可能是故意的。 当没有足够的数据继续运行时,人工智能在预测事物方面做得非常糟糕。 有一件事很清楚:如果您在推荐内容时没有通过积极考虑创作者的新颖性来关注此问题,您将不会知道用户和创作者受到的影响有多严重。

Political Bias, Ad Targeting, and Facebook’s Civil Rights Audit

政治偏见,广告定位和Facebook的民权审计

Another kind of bias that we often see within industry is a perceived bias (interestingly, from within both parties) for or against a certain type of political content. While this section will focus on ad targeting in particular, the same issue applies to organic posts among users and in recommending connections or friends on the social networks hosting the debate.

我们在行业内经常看到的另一种偏见是(针对双方 )某种或某些政治内容的感知偏见 (有趣的是,来自双方 )。 尽管本节将特别关注广告定位,但同一问题也适用于用户之间的自然发帖,以及在主持辩论的社交网络上推荐人脉或朋友。

As the ACLU pointed out in their historic litigation of Facebook that resulted in a 2019 settlement, the ads that people can see on Facebook can have an impact on their lives in ways that are significant, even when difficult to quantify. Ads can surface politically charged information, but they can also highlight opportunities for career advancement, education, financial literacy, housing, and capital/credit. If one segment of the population receives these potentially-life-improving ads more so than another, this will exacerbate existing inequalities leaving our society less fair than it was before. On the opposite side, ads can be predatory, offering, for example, misinformation or outrageous interest rates that trap people in a cycle of poverty making it very hard to advance.

正如ACLU在其对Facebook的历史性诉讼中导致的2019年和解所指出的那样 ,即使在难以量化的情况下,人们在Facebook上看到的广告也可能以重要的方式对他们的生活产生影响。 广告可以显示带有政治色彩的信息,但也可以突出职业发展,教育,金融知识,住房和资本/信贷机会。 如果某一部分人群比其他人群更能收到这些可能改善生活的广告,那么这将加剧现有的不平等现象,使我们的社会比以往更加不公平。 相反,广告可能是掠夺性的,例如提供错误的信息或过高的利率,使人们陷入贫困的循环之中,从而很难发展。

Political ads present perhaps the simplest way to think about auditing recommenders for bias: it’s easy to track whether you’re presenting users with an even amount of information from the Democratic and Republican parties. (This of course ignores the fact that there are multitudes of political stances, which exist on a spectrum and are not easily or cleanly defined. In the example of ad targeting, at least, we can call this “simple” mainly because it’s clear who is buying the ad to promote on Facebook, while performing the same kind of audit on organic content will be much more ambiguous and challenging.)

政治广告可能是考虑审核推荐人是否有偏见的最简单方法:很容易跟踪您是否向用户展示了来自民主党和共和党的均匀信息。 (当然,这忽略了以下事实:在频谱上存在多种政治立场,而且不容易或不明确地定义。在广告定位的示例中,至少,我们可以称其为“简单”,主要是因为清楚是谁正在购买要在Facebook上进行宣传的广告,而对有机内容进行相同类型的审核将更加模棱两可且更具挑战性。)

Image for post

But what about the opportunities? The most challenging part of assessing bias in cases of “positive” vs. “negative” impact to quality of life may very well be the definition of what constitutes “positive” and “negative”. It’s not enough to simply quantify a particular ad as “financial”, for instance, because a financial ad can either be beneficial when it recommends refinancing a student loan or mortgage from a reputable lender, or harmful in the case of payday loans and other predatory financial instruments. In order to truly track whether your recommender is breaking discrimination laws by behaving in ways that impact protected classes differently, a qualitative assessment is needed on each ad, making it difficult to achieve at scale.

但是机会呢? 在对生活质量产生“积极”与“消极”影响的情况下评估偏见的最具挑战性的部分可能就是“积极”和“消极”的定义。 例如,仅将特定广告量化为“财务”是不够的,因为财务广告在建议从信誉良好的贷方为学生贷款或抵押贷款再融资时可能是有益的,而在发薪日贷款和其他掠夺性贷款的情况下可能有害金融工具。 为了真正跟踪您的推荐者是否通过行为方式不同地影响受保护的类别来违反歧视法,每个广告都需要进行定性评估,因此很难大规模实现。

This need for qualitative assessment and clearly defined categorization is most evident when we think about how Facebook enables the spread of misinformation. While it seems as though defining “truth” should be easy, take it from a Philosopher that this is often an impossible task. This is precisely why Facebook, when faced with its own self-imposed civil rights audit, has been asked to step up efforts to identify and remove this misleading content, leaving it very open to partisan attacks on both sides.

当我们考虑Facebook如何使错误信息传播时,最明显的是需要定性评估和明确定义的分类。 尽管似乎似乎很容易定义“真相”,但从哲学家的角度来看,这通常是不可能完成的任务。 这就是为什么Facebook在面对自己的公民权利自我审计时 ,被要求加紧努力,以识别和删除这种误导性内容,使它很容易受到双方的党派攻击。

第3部分:我们能做什么? (Part 3: What Can We Do?)

There’s no magic bullet to mitigate bias in recommender systems, but the first step to solving any problem is to quantify it. In industry, it’s been shocking at times to see the degree to which some enterprises want to keep their heads in the proverbial sand on issues of algorithmic discrimination. The old days of “fairness through unawareness” as a tactic to address bias are clearly coming to an end. In two cases specifically, with more on the horizon, federal prosecutors have opened inquiries into companies guilty of unintentional race and gender discrimination.

减轻推荐系统中的偏差没有万能的方法,但是解决任何问题的第一步就是对其进行量化。 在行业中,有时看到某些企业希望在算法歧视问题上保持领先的程度令人震惊。 作为解决偏见的策略,“ 通过无知获得公平 ”的旧时代显然即将结束。 特别是在两个 案件中 ,随着更多案件的出现,联邦检察官已对涉嫌无意种族和性别歧视的公司展开了调查。

In what may seem counterintuitive, a necessary first step towards addressing algorithmic discrimination must be to collect protected class information like race, gender, disability etc. from users. For years, the adage has been that it’s impossible to discriminate if you’re unaware of the subject’s protected class status. Unintentional algorithmic discrimination proves that this is no longer a viable strategy, and the automated systems that govern our daily experiences are exascerbating existing inequalities through the exploitation of features that serve as proxies for protected categories like race. In recommender systems, the very content that you like and have enjoyed in the past can most certainly be, in many cases, a proxy for your race.

似乎违反直觉的,解决算法歧视的必要第一步必须是从用户那里收集种族,性别,残疾等受保护的阶级信息。 多年来的谚语一直是,如果您不知道受试者的受保护阶级身份,就无法区别对待。 无意识的算法歧视证明这不再是可行的策略,管理我们日常经验的自动化系统通过利用作为种族等受保护类别的代理的特征来消除现有的不平等现象。 在推荐系统中,在很多情况下,您过去喜欢和喜欢的内容很可能可以作为您比赛的代理。

A second, important task will be to create categories for content and ads based on its usefulness or harmfulness. While this is a challenge, and one that will be honed over hours of careful discussion, it is not impossible to categorize certain things into buckets that can then be tracked. For instance, ads offering higher education from accredited universities can be differentiated from ads promoting for-profit certifications. While there may not be clear consensus on every item (as we see from attempts to define deepfakes or other forms of misinformation), these are debates that must be had early, often, and with transparency to the public lest these issues become swept under the rug for being “too hard to scale”. Once you have knowledge of the protected classes and the rates at which they are recommended “positive” versus “negative” context, you can calculate a metric that gets at the disparity of how your platform influences their lives.

第二项重要任务是根据内容或广告的有用性或有害性为其创建类别。 尽管这是一项挑战,并且需要经过数小时的认真讨论才能磨练,但并非不可能将某些事物归类为可跟踪的存储桶。 例如,可以将来自认可大学的提供高等教育的广告与推广营利性认证的广告区分开。 尽管可能在每个项目上都没有达成清晰的共识(从试图定义深造或其他形式的错误信息中可以看出 ),但这些辩论必须尽早进行,并且必须对公众保持透明,以免这些问题席卷全球。地毯“太难扩展”。 一旦了解了受保护的类别及其被推荐的“正”与“负”上下文的比率,您就可以计算出一个衡量标准,以衡量您的平台如何影响他们的生活。

To make one final point, the betterment of bias must be a never ending pursuit. Algorithms can be tested ad nauseum in the lab only to become biased as cultural shifts occur. Alternatively, these changes can happen when platforms adopt new segments of the population and user personas that they may not understand as well as those that have been previously defined. New categories of ads and content will appear and fade away as swiftly as our lives are influenced by the events of the day. We must ensure that, as practitioners, we bring an ethos of continual improvement to issues of algorithmic discrimination, never calling what’s been done already “good enough”.

最后,偏见的改善必须是永无止境的追求。 只能在实验室中对算法进行测试,以防出现文化转变时产生偏见。 或者,当平台采用他们可能不理解的人群和用户角色的新细分以及先前定义的人群和用户角色时,可能会发生这些更改。 随着我们的生活受到一天中事件的影响,新的广告和内容类别将Swift出现并消失。 我们必须确保,作为从业者,我们将不断改进的精神带入算法歧视问题,永远不要称已完成的工作“足够好”。

翻译自: https://medium.com/arthur-ai/recommendation-engines-need-fairness-too-7421411c0483

锁 公平 非公平

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值