标记偏见_偏见潜入技术

最新推荐文章于 2024-09-15 22:31:42 发布

weixin_26630173

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量156

点赞数

文章标签： python java 人工智能算法 linux

原文链接：https://towardsdatascience.com/bias-creeps-into-technology-e221dd7eae76

版权

标记偏见

The majority of folks who build technology don’t intend to be biased. Yet we all have our own unique perspective on the world, and we can’t help but bring that into our work. We make decisions based on our views and our experiences. Those decisions may each seem small in isolation, but they accumulate. And, as a result, technology often reflects the views of those who build it.

开发技术的大多数人都不会有偏见。然而，我们每个人对世界都有自己独特的观点，我们不禁将其纳入我们的工作。我们根据自己的观点和经验做出决定。这些决定似乎每个都是孤立的很小的事，但它们是累加的。因此，技术通常会反映出制造它的人的观点。

Here are a few of the places I’ve seen where bias creeps into technology.

这是我见过的一些偏向技术的地方。

我们构建的数据集 (The datasets we construct)

With the recent success of machine learning (ML) and AI algorithms, data is becoming increasingly important. ML algorithms learn their behaviour from a dataset. What’s contained in those datasets becomes important as it directly impacts the performance of a product.

随着机器学习(ML)和AI算法的最新成功，数据变得越来越重要。 ML算法从数据集中学习其行为。这些数据集中包含的内容非常重要，因为它直接影响产品的性能。

Take the field of Natural Language Understanding (NLU) where large pre-trained models have recently become popular. The pre-trained models are expensive to build, but once built they can be reused among different tasks by different people. BERT is one of the most widely used pre-trained models, and it was built from Wikipedia text. Wikipedia has its own problems as a data source. Of biographies on the site, only 18% are of women and the vast majority of content is written by editors in Europe and North America. The resulting biases in Wikipedia are learnt by the BERT model and propagated.

以自然语言理解(NLU)领域为例，大型预训练模型最近变得很流行。预训练的模型构建起来很昂贵，但是一旦构建，它们就可以由不同的人在不同的任务之间重用。 BERT是使用最广泛的预训练模型之一，它是根据Wikipedia文本构建的。维基百科作为数据源有其自身的问题。该网站上的传记中，只有18％是女性，大部分内容由欧洲和北美的编辑撰写。维基百科中由此产生的偏差是由BERT模型学习并传播的。

In another field, Computer Vision, datasets are equally problematic in their composition. One class of datasets is of faces, from which facial recognition systems are trained. They are often overwhelmingly white, with two popular datasets having 79.6% and 86.2% lighter faces. Datasets like this lead to ML models which perform poorly for people with darker skin.

在另一个领域，计算机视觉中，数据集在构成上同样存在问题。一类数据集是面部，从中训练出面部识别系统。它们通常绝大多数是白色的，两个流行的数据集的脸部颜色更浅，分别为79.6％和86.2％。像这样的数据集导致ML模型对于皮肤较黑的人效果不佳。

我们决定解决或不解决的问题 (The problems we decide to tackle, or not)

At CogX, I hosted a session where Dr Heidi Christensen talked about her work researching voice technology for those with disordered speech & clinical voice impairments. Voice technology potentially has a large impact on the lives of those with voice impairments because the same conditions that affect their voice also affect other movements, making it hard to carry out many simple tasks. Gaining independence is associated with better outcomes. Yet, the mainstream of voice technology focuses on healthy speakers and not on those with non-standard speech patterns.

在CogX上，我主持了一场会议，海蒂·克里斯滕森(Heidi Christensen)博士谈到了她为有言语障碍和临床声音障碍的人研究语音技术的工作。语音技术可能会对有语音障碍的人们的生活产生重大影响，因为影响他们语音的相同条件也会影响其他动作，因此很难执行许多简单的任务。获得独立与更好的结果相关。但是，语音技术的主流侧重于健康的说话者，而不是那些具有非标准语音模式的说话者。

Other times, the framing of a task is problematic and risks perpetuating stereotypes. Take the task of gender classification. I’m confident that nothing good is going to come out of a system identifying me as a woman. I might though, for example, be shown adverts for lower paying jobs or get pointed to more expensive products.

在其他时候，任务的框架是有问题的，并且存在永久的定型观念的风险。承担性别分类的任务。我相信，将我识别为女性的系统不会带来任何好处。但是，例如，我可能会看到一些薪水较低的广告，或者指向更昂贵的产品。

Decisions about what tasks to work on are usually guided by financial concerns — who will fund a product and who will pay to use it — but also personal experience of those building and the issues that resonate with them.

关于要执行哪些任务的决定通常取决于财务方面的考虑—谁来资助产品以及谁来支付使用产品的费用—还要考虑到那些建筑的亲身经历以及与之共鸣的问题。

我们为任务分配的优先级 (The priorities we assign to tasks)

I wear different hats, but one of my many jobs has been to prioritise technical tasks that teams work on. In an ideal world, we’d have enough time and money to work on everything. But in reality, we have a limited amount of time. We have to pick and choose what to work on. Each prioritisation decision might seem small and inconsequential, but together they add up to have a big impact on the direction of a product.

我戴着不同的帽子，但是我的许多工作之一是优先考虑团队所从事的技术任务。在理想的世界中，我们将有足够的时间和金钱来完成所有工作。但实际上，我们的时间有限。我们必须选择要处理的内容。每个优先级决策似乎很小且无关紧要，但是它们加在一起对产品的方向产生很大影响。

Here’s a hypothetical example inspired by real events— when building a speech recognition system we might put effort into building a balanced training set, but still evaluate the system to find that it performs poorly for a particular demographic. For example, our speech recognition system might perform worse for UK speakers because some of our pronunciations and word choices are very different from those the system is expecting. Now, there’s the choice between spending the team’s effort to investigate and make the system more reliable for the UK users, vs. experimenting with a new model architecture that looks promising to make the entire system perform better for everyone. This choice isn’t always an easy one to make. The second option will probably end up improving performance for the UK users too, but it wouldn’t address the imbalance.

这是一个受真实事件启发的假设示例，在构建语音识别系统时，我们可能会努力构建平衡的训练集，但仍会评估该系统，以发现其在特定人群中的表现不佳。例如，由于我们的某些发音和单词选择与系统预期的完全不同，因此我们的语音识别系统对英国讲者的效果可能会更差。现在，您可以选择在团队进行调查以使系统对英国用户更可靠的过程中，还是在尝试一种看起来有望使整个系统的性能更好的新模型架构方面进行选择。这种选择并不总是一件容易的事。第二种选择可能也最终会提高英国用户的性能，但不能解决不平衡问题。

我们听取的意见 (The opinions we listen to)

The people we listen to influence both our thinking and our views of who should hold those influential positions.

我们倾听的人会影响我们的思想和我们对谁应该担任那些有影响力的职位的看法。

The demographics of tech companies — the visionaries and the decision makers — are notoriously skewed. At the top, the ranks are dominated by white and Asian men. Outside of tech, the composition of the top levels isn’t a whole lot better. The FTSE 100 has more men named Steve than ethnic minorities as CEO, and only 6 women. By not promoting a wider range of people into these ranks, we are not listening to their views and perspectives.

众所周知，高科技公司(有远见卓识的人和决策者)的人口统计存在偏差。在高层中，白人和亚裔男子占主导地位。在技术之外，高层的构成并没有改善很多。富时100指数中有史蒂夫(Steve)担任首席执行官的男性人数多于少数族裔，只有6名女性。通过不将更多的人提升到这些级别，我们就没有在听取他们的观点和观点。

A survey in 2016 showed that the British media is 94% white with women paid significantly less than men. While traditional media is skewed in its composition, social media has given a platform to many underrepresented voices. Yet, even social media has a gender problem. A recent study looked at the field of academic medicine, and found that “female academics also have disproportionately fewer Twitter followers, likes and retweets than their male counterparts on the platform, regardless of their Twitter activity levels or professional rank”.

2016年的一项调查显示，英国媒体的白人比例为94％，而女性的薪水明显低于男性。尽管传统媒体的构成有所偏斜，但社交媒体为许多代表性不足的声音提供了平台。但是，即使社交媒体也存在性别问题。最近的一项研究针对学术医学领域，发现“ 与平台上的男性同行相比，女性学者在Twitter追随者，喜欢和转发上的人数也比男性同行少得多 ”。

我们评估自己的指标 (The metrics we evaluate ourselves by)

We often evaluate ML systems by their average error rate. It’s easy to compute this and compare it across different systems. Perhaps 100 people use our system, and each has an error rate of 95%. For most systems, that’s perfectly usable. Suppose instead that 90 of those users have an error rate of 98%, and 10 have an error rate of 68%. Now, there’s a huge discrepancy between these two groups, but the average error rate is still 95%. The group of users finding the error rate of 68% might find the system unusable, but that doesn’t show in average metrics. Without measuring performance for different demographic groups, we can’t uncover biases in the models we build.

我们经常通过它们的平均错误率来评估机器学习系统。对此进行计算并在不同系统之间进行比较很容易。也许有100个人使用我们的系统，每个人的错误率都是95％。对于大多数系统，这是完全可用的。假设这些用户中有90个的错误率为98％，而10个的错误率为68％。现在，这两组之间存在巨大差异，但平均错误率仍为95％。发现错误率达68％的用户组可能会发现系统无法使用，但并未在平均指标中显示。如果不对不同人口群体的绩效进行衡量，我们将无法发现我们构建的模型中的偏差。

In other products, we measure and optimise engagement — number of clicks & likes on a post, or time spent on a website. But, engagement may not be the best measure for the wellbeing of users. Engagement can be caused not only by liking someone’s page or post, but also by frustration with the content. It’s been shown that higher levels of sentiment in a headline makes readers more likely to click on it, but also polarises views and leads to echo chambers which reinforce extreme views over time rather than challenge them.

在其他产品中，我们衡量和优化参与度 -帖子的点击次数和喜欢次数或在网站上花费的时间。但是，参与度可能不是用户福祉的最佳衡量标准。参与不仅可以由喜欢某人的页面或帖子引起，也可以由对内容的挫败引起。事实表明，标题中较高的情感水平使读者更有可能点击它，但也会使观点两极化，并导致回声室，随着时间的流逝，它们会增强极端的观点，而不是挑战他们。

我们对客户的看法 (The view we take of our customers)

When I was pregnant, my Nintendo Wii berated me for putting on weight. There was no way to tell it that my weight gain was only temporary, and ultimately I consigned it to the bin. We design systems imagining how our customers will use them. But, customers are different from us in ways which we cannot always anticipate. It seemed that the Nintendo Wii designers hadn’t anticipated pregnancy as something their users might experience. The Nintendo Wii is something I could just stop using, but pregnancy discrimination is a very real issue.

当我怀孕时，我的任天堂Wii指责我增加了体重。没有办法告诉我我的体重增加只是暂时的，最终我将其托运到垃圾箱。我们设计的系统会想象客户如何使用它们。但是，客户与我们的不同之处在于我们无法始终预期。任天堂Wii设计师似乎并没有预料到怀孕是用户可能会体验到的东西。我只能停止使用任天堂Wii，但是怀孕歧视是一个非常现实的问题。

At another time, I was on the receiving end of a presentation about a home security system, hooked up to the cloud. Not only could you check the footage online, while you were out of the house, but you could also check who was in the house at any time and set alerts when particular events happened. The team designing this imagined their users to be like them — proud fathers and loving husbands, who simply wanted to do a good job of keeping their homes secure. But, technology also enables abuse in new ways. The designers didn’t imagine that some of their customers might take advantage of such a home security system with different intentions, and so safety was an afterthought.

在另一个时候，我正在接受有关连接到云的家庭安全系统的演示的接收端。您不仅可以在出门在外时在线检查录像，而且还可以随时检查谁在家里，并在发生特定事件时设置警报。设计此团队的团队想象他们的用户像他们一样-骄傲的父亲和充满爱心的丈夫，他们只是想做好保住房屋的工作。但是，技术还可以通过新方式来实现滥用。设计师没有想到他们的某些客户可能会出于不同的意图利用这样的家庭安全系统，因此安全是事后的事。

The view we take of our customers can limit what we build for them, often in areas where we have blindspots, and reinforce the biases we hold.

我们对客户的看法可能会限制我们为他们打造的产品，通常是在我们有盲点的地区，并加剧我们持有的偏见。

我们选择的商业模式 (The business models we choose)

In the technology industry, many Engineering jobs are well paid and secure. This contrasts to low paid, insecure jobs such as data annotator, driver and content moderator without which many tech companies could not operate. A reliance on these roles is crucial to the business model of many tech companies and the demographics of these workers are very different to those of the engineering staff.

在技术行业中，许多工程工作都是高薪且安全的。这与诸如数据注释器，驱动程序和内容主持人之类的低薪，不安全的工作形成对比，没有这些工作，许多高科技公司将无法运营。依赖于这些角色对于许多科技公司的商业模式至关重要，而且这些工人的人口统计学与工程人员的统计学差异很大。

A recent study of algorithmic pricing of ride hailing companies found that factors like ethnicity, education and age all affected ride prices, despite not being an explicit part of the model. This is because they are correlated with factors that the models do take into account, like location.

最近对打车公司算法定价的研究发现，种族，教育程度和年龄等因素都会影响打车价格，尽管这并不是该模型的明确组成部分。这是因为它们与模型所考虑的因素(例如位置)相关。

Another common business model relies on offering services for free and getting revenue from advertising. This is a double-edged sword. On the one hand, making products available for free widens their reach and allows people to use them who might not otherwise have been able to afford them. On the other hand, the targeted advertising that comes with this business model is another way that bias is reinforced, for example by allowing adverts to be targeted for race and gender.

另一个常见的商业模式依赖于免费提供服务并从广告中获得收益。这是一把双刃剑。一方面，免费提供产品扩大了其覆盖范围，并允许人们使用原本无法负担得起的产品。另一方面，这种商业模式附带的定向广告是另一种可以增强偏见的方式，例如通过允许针对种族和性别的广告来定向。

Technology has made a huge impact in the world. But the world is biased and those of us who build technology have blindspots that we don’t even know about. Even with the best intentions, it’s difficult to keep bias out of the products we build.

技术在世界上产生了巨大的影响。但是，世界充满了偏见，我们当中那些建设技术的人有我们甚至都不知道的盲点。即使有最好的意图，也很难在我们制造的产品中保持偏见。