标记偏见_人工智能的偏见

标记偏见

Artificial Intelligence and Machine Learning are awesome. They allow our mobile assistants to understand our voices and book us an Uber. AI and Machine Learning systems recommend us books in Amazon, similar to the ones we’ve liked in the past. They might even make us have an amazing match in a dating application and meet the love of our life.

人工智能和机器学习很棒。 它们使我们的移动助手能够理解我们的声音并为我们预订Uber。 人工智能和机器学习系统向我们推荐亚马逊上的书籍,与我们过去喜欢的书籍相似。 它们甚至可以使我们在约会应用程序中实现惊人的匹配,并满足我们一生的热爱。

All of these are cool but potentially harmless applications of AI: If your voice assistant doesn’t understand you, you can just open the Uber application and order a car yourself. If Amazon recommends you a book that you might not like, a little research can make you discard it. If an app takes you on a blind date with someone who is not a good match for you, you might even end up having a good time meeting somebody who’s personality might be bewildering.

所有这些都是不错的AI程序,但可能对人体无害:如果您的语音助手不了解您,您可以打开Uber应用程序并自己定购汽车。 如果亚马逊为您推荐了您可能不喜欢的书,那么稍作研究便可以将其丢弃。 如果某个应用让您与不适合您的人相亲,那么您甚至可能会遇到很多与性格令人困惑的人相处的好时光。

Things get rough, however, when AI is used for more serious tasks like filtering job candidates, giving out loans, accepting or rejecting insurance requests, or even for medical diagnosis. All of the previous decisions, partially assisted or completely taken care of by AI systems can have a tremendous impact on somebody's life.

但是,当AI用于更严格的任务时,例如筛选求职者,发放贷款,接受或拒绝保险请求,甚至进行医学诊断,事情就变得艰难了。 AI系统部分协助或完全照顾的所有先前决策都可能对某人的生活产生巨大影响。

For these kinds of tasks, the data that is fed into the Machine Learning systems that sit at the core of this AI applications has to be contentiously studied, trying to avoid the use of information proxies: pieces of data that are used to substitute another one that would be more legitimate and precise for a certain task but that is not available.

对于此类任务,必须认真研究送入此AI应用程序核心的机器学习系统中的数据,以尝试避免使用信息代理:用于替代另一个数据的数据片段对于某些任务,这将更为合法和精确,但这是不可用的。

Let's go to the example of car insurance requests that are automated by a machine learning system: an excellent driver that lives in a poor and badly regarded area could get a car insurance request rejected if ZIP code is used as a variable in the model, instead of pure driving and payment metrics.

让我们看一下通过机器学习系统自动执行的汽车保险请求的示例:如果居住在贫困且服务欠佳的地区的优秀驾驶员可以在模型中使用ZIP代码作为变量来拒绝汽车保险请求,纯粹的驾驶和付款指标。

Aside from these proxies, AI systems also depend on the data that they were trained with in another manner: training in non-representative samples of a population, or training on data that has been labelled with some sort of bias, produces the same bias in the resulting system.

除了这些代理外,AI系统还依赖于以其他方式对其进行训练的数据:在总体的非代表性样本中进行训练,或者对已被标记为某种偏差的数据进行训练,也会在AI上产生相同的偏差。结果系统。

Let's see some examples of bias derived from AI.

让我们看一些源自AI的偏差示例。

Tay:进攻性Twitter Bot (Tay: The offensive Twitter Bot)

Image for post

Tay (Thinking about you) was a Twitter Artificial Intelligence chatbot designed to mimic the language patterns of a 19 year old american girl. It was developed by Microsoft in 2016 under the user name TayandYou, and was put on the platform with the intention of engaging in conversations with other users, and even uploading images and memes from the internet.

Tay( 为您着想)是一个Twitter人工智能聊天机器人,旨在模仿19岁美国女孩的语言模式。 它由Microsoft在2016年以用户名TayandYou进行开发 ,并被放置在平台上,旨在与其他用户进行对话,甚至从Internet上传图像和模因。

After 16 hours and 96000 tweets it had to be shut down, as it began to post inflammatory and offensive tweets, despite having been hard-coded with a list of certain topics to avoid. Because the bot learned from the conversations it had, when users that interacted with it started tweeting politically incorrect phrases, the bot learned these patterns and started posting conflicting messages about certain topics.

在经过16个小时和96000条推文后,尽管它已硬编码并避免了某些主题,但由于它开始发布煽动性和攻击性推文,因此必须将其关闭。 因为该僵尸程序从其对话中获悉,所以当与它进行交互的用户开始发布政治上不正确的短语时,该僵尸程序便学会了这些模式并开始发布有关某些主题的冲突消息。

Machine learning systems learn from what they see, and in this case, this parrot like behaviour adopted by Tay caused a big public shame for Microsoft, that ended with this letter, as their 19 year old girl turned into a neo-Nazi millennial chatbot.

机器学习系统从他们所看到的东西中学习,在这种情况下,Tay采取的这种鹦鹉般的举动引起了微软的巨大公众耻辱,最终以这封信结尾,因为他们的19岁女孩变成了一个新纳粹千禧一代聊天机器人。

In the following link you can find some examples of Tay’s Tweets.

在下面的链接中,您可以找到Tay的推文的一些示例。

Now, imagine if instead of being intended for its use on a social network, a Chatbot like this one had been used as a virtual psychologist or something similar. Or imagine that the bot started targeting specific people in social media and attacking them. The people speaking to it could have been seriously hurt.

现在,想象一下,如果不是打算将其用于社交网络,而是将像这样的聊天机器人用作虚拟心理学家或类似的东西。 或想象一下,该机器人开始针对社交媒体中的特定人员并发起攻击。 与之交谈的人们可能会受到严重伤害。

Google的Racist Image应用程序 (Google’s Racist Image application)

Image for post

Another big tech company, Google this time, has also had some issues regarding bias an racism. In 2015, some users of Google’s image recognition in Google’s Photos received results where the application was identifying black people as Gorillas. Google apologised for this and came out saying that Image recognition technologies were still at an early stage, but that they would solve the problem. You can read all about it in the following link.

另一家大型科技公司Google这次也存在一些关于种族主义偏见的问题。 2015年,在Google的“照片”中使用Google图像识别功能的一些用户收到了应用程序将黑人识别为大猩猩的结果。 Google为此表示歉意,并表示图像识别技术仍处于早期阶段,但可以解决该问题。 您可以在以下链接中阅读所有内容。

If a company as powerful and technologically advanced as Google can have these sort of issues, imagine the hundreds of thousands of other businesses that create AI powered software and applications without such expertise. It’s a good reminder of how difficult it can be to train AI software to be consistent and robust.

如果一家像Google这样强大和技术先进的公司可能遇到此类问题,请想象成千上万的其他企业在没有此类专业知识的情况下创建基于AI的软件和应用程序。 这很好地提醒了培训AI软件以保持一致和强大的难度。

This is not however, the only issue Google has had with images and Artificial Intelligence. Hand held thermometer guns have become widely used throughout the COVID pandemic, and Google’s Cloud Vision Software (a service for detecting and classifying objects in images) has had to quickly learn to identify these kind of devices in order to correctly classify them using data sets containing very few images, as these devices, despite not being new, have become known to the general public very recently.

但是,这并不是Google在图像和人工智能方面唯一遇到的问题。 手持温度计枪已在整个COVID大流行中广泛使用,并且Google的Cloud Vision软件 (一种用于对图像中的对象进行检测和分类的服务)必须快速学习识别此类设备,以便使用包含以下内容的数据集对它们进行正确分类由于这些设备虽然不是新设备,但最近才为公众所熟知。

Image for post
Source. Source

The previous image shows how one of these thermometer guns gets classified as a gun when it is held by a person of dark skin, and as a monocular when it is held by a person with salmon color skin. Tracy Frey, director of Product Strategy and Operations at Google, wrote after this viral case:

上一张图像显示了这些温度计枪中的一个在被深色皮肤的人握住时如何归类为枪支,而在由鲑鱼色皮肤​​的人握住时如何归为单眼。 Google的产品策略和运营总监Tracy Frey在此病毒案之后写道:

“this result [was] unacceptable. The connection with this outcome and racism is important to recognise, and we are deeply sorry for any harm this may have caused.”

“这个结果是不可接受的。 必须认识到与这一结果和种族主义的联系,对于由此可能造成的任何伤害,我们深表歉意。”

The way Google has fixed this is by changing the confidence probabilities (the 61% that appears in the image above) needed for Cloud Vision to return a gun or firearm, however, this is just a change in the displays of the results of the Artificial Intelligence model, and not the model itself, highlighting again the difficulties of getting these systems to behave properly in many cases, especially when there is little data.

Google修复此问题的方法是更改​​Cloud Vision归还枪支或枪支所需的置信度(上图中显示的61%),但是,这只是人工结果的显示上的变化智能模型,而不是模型本身,再次突出了使这些系统在许多情况下正常运行的困难,尤其是在数据很少的情况下。

What if a system like this one had been used for locating potentially harmful or suspicious individuals using surveillance cameras in the street? Innocent people could have been targeted as dangerous just because of their skin color.

如果像这样的系统被用来在街上使用监视摄像机来定位潜在有害或可疑的人怎么办? 无辜的人可能只是因为他们的肤色而成为危险的目标。

最新的偏爱AI新闻: (Latest Biased AI news:)

Recently, there’s been a lot of discussion around the topic of Bias in Artificial Intelligence between some of the top AI researchers in the world, spawned from the publication of the paper “PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models”. This model transforms low resolution images into a higher resolution one using AI, as shown in the following tweet.

最近,世界上一些顶尖的AI研究人员之间围绕“人工智能的偏见”主题进行了很多讨论,这些讨论源于论文“ PULSE:通过潜在空间探索生成模型进行自我监督的照片上采样” 此模型使用AI将低分辨率图像转换为高分辨率图像,如以下推文所示。

This tweet came with a link to a Google Colab Notebook (a programming environment) where anyone could run code and try the model using different images. This soon led to people finding that PULSE appears to be biased in favor of outputting images of white people, having a concrete user respond to the previous one with a pixelated image of Barack Obama, that was reconstructed into an image of a white man.

该推文带有指向Google Colab Notebook(编程环境)的链接,任何人都可以运行代码并使用不同的图像尝试模型。 很快,人们发现PULSE似乎偏向于输出白人图像,让一个具体的用户用巴拉克·奥巴马(Barack Obama)的像素化图像来响应前一个图像,该图像被重建为白人图像。

The authors of the paper responded to this, adding a bias section to the paper, and including a Model Card: a document that clarifies the details of the model, its purpose, the metrics used to evaluate it, that data it was trained with, and a breakdown of results in different races along with some ethical considerations. I think the creation of this kind of documents when a Machine Learning model is constructed is a great practice that should be more frequently done.

论文的作者对此做出了回应,在论文中添加了一个偏见部分,其中包括“模型卡”:阐明该模型的详细信息,其目的,用于评估该模型的度量标准,该模型所训练的数据的文档,以及不同种族的结果细分以及一些道德考量。 我认为在构建机器学习模型时创建此类文档是一个很好的实践,应该更频繁地进行。

You can find the discussion and further information on this topic in the following link.

您可以在以下链接中找到有关此主题的讨论和更多信息。

人工智能偏差的其他示例 (Other examples of Bias in Artificial Intelligence)

Aside from these previous cases, all of which had some resonance in the media, there are many other, lesser known cases of models that have a similar discriminatory smell as the previous ones. A section could be written for each, but here we will briefly mention them, allowing the reader to further investigate if desired.

除了这些先前的案例,所有这些案例在媒体上都引起了共鸣,还有许多其他鲜为人知的模型案例,它们具有与先前的案例类似的歧视性气味。 可以为每个部分编写一个小节,但在这里我们将简要提及它们,以便读者在需要时进一步进行研究。

  • Woman are less likely than men to be shown ads for high paid jobs on Google. The models built for displaying these adds used information such as personal information, browsing history and internet activity. Link.

    在Google上,女性比男性不太可能获得高薪工作的广告 。 用于显示这些信息的模型会添加一些二手信息,例如个人信息,浏览历史记录和互联网活动。 链接

  • An algorithmic Jury: Using Artificial Intelligence to predict Recidivism rates. A predictive model used for seeing is an individual would commit crimes again after being set free (and therefore used to extend or decrease the individual’s time in jail) shows racial bias, being a lot tougher on black individuals than on white ones. Link.

    一种算法陪审团:使用人工智能预测累犯率 s。 用于观察的一种预测模型是,一个人在获释后会再次犯罪(因此被用来延长或减少该人在监狱中的时间),这显示出种族偏见,对黑人而言,要比对白人要强得多。 链接

  • Uber’s Greyball: escaping Worldwide authorities: Data collected from the Uber app, is used to evade local authorities who try to clamp down their riders in countries where the services is not permitted by law. This is not an example of bias per se, but it puts focus on what AI can do to discriminate certain users (in this case Police officers), and how it can be used towards selfish interests. Link.

    Uber的Greyball:逃逸全球当局:从Uber应用程序收集的数据用于逃避地方当局,这些地方当局试图在法律不允许的国家/地区限制骑手。 这本身不是偏见的一个例子,而是将重点放在AI可以做什么以区分某些用户(在本例中为警务人员)以及如何将其用于自私的利益上。 链接

  • Lastly, not all were going to be bad news for AI. The following link shows how AI powered systems can reduce bias in University recruiting applications: Link.

    最后,并非所有人都将成为AI的坏消息。 以下链接显示了AI驱动的系统如何减少大学招聘应用程序中的偏差链接

我们能做什么呢? (What can we do about all this?)

We’ve seen what can happen if AI systems start showing racial, gender or any other kind of bias, but, what can we do about it?

我们已经看到,如果AI系统开始表现出种族,性别或任何其他偏见,会发生什么,但是,我们该怎么办?

To regulate these mathematical models, the first step has to start with the modellers themselves. When creating these models, designers should try to avoid using overly complex mathematical tools that put a fog on the simplicity and explain-ability of the models. They should study very carefully the data that is being used to build these models, and try to avoid the use of dangerous proxies.

为了规范这些数学模型,第一步必须从建模者本身开始。 在创建这些模型时,设计人员应避免使用过于复杂的数学工具,这些工具会迷惑模型的简单性和可解释性。 他们应该非常仔细地研究用于构建这些模型的数据,并避免使用危险的代理。

Also, they should always consider the final goal of the models: making people life’s easier, providing value to the community, and improving our overall quality of life, being through business or academia, instead of focusing on the Machine Learning metric like accuracy or mean squared error. Also, if the models are built for an specific business, another usual success metric probably has to be put on a second plane: economic profit. Aside from this profit, the results of the models in terms of the decision it is making should be examined: following our insurance example, the creators of the model should look at who is getting rejected, and try to understand why.

此外,他们应该始终考虑模型的最终目标:通过业务或学术界使人们的生活更轻松,为社区提供价值并改善我们的整体生活质量,而不是关注诸如准确性或均值之类的机器学习指标。平方误差。 同样,如果模型是为特定业务构建的,则可能必须将另一个通常的成功度量标准放在第二个平面上:经济利润。 除了获得的利润外,还应检查模型在做出决策方面的结果:按照我们的保险示例,模型的创建者应查看谁被拒绝,并尝试理解原因。

As we progress into a more data-driven world, governments might have to step in to provide a fair and transparent regulation for the use of Artificial Intelligence models in certain areas like finance, insurance, medicine and education. All of these are fundamental pieces of any individuals life, and should be treated very carefully.

随着我们进入一个由数据驱动的世界,政府可能必须介入,以在金融,保险,医学和教育等某些领域为使用人工智能模型提供公平,透明的监管。 所有这些都是任何个人生活的基本组成部分,应非常谨慎地对待。

As AI practitioners, the people creating the systems have a responsibility to re-examine the ways they collect and use data. Recent proposals set standards for documenting models and datasets to weed out harmful biases before they take root, using the Model Cards mentioned before and a similar system for datasets: Datasheets.

作为AI从业者,创建系统的人员有责任重新检查他们收集和使用数据的方式。 最近的提议使用以前提到的模型卡和类似的数据集系统为数据集建立了模型数据集的文档标准,以消除有害偏见。

Aside from this, we should try to build non black-box, explainable models, audit these models, and track their results carefully, taking the time to manually analyse some of the outcomes.

除此之外,我们应该尝试建立非黑匣子的,可解释的模型,审核这些模型并仔细跟踪其结果,花时间手动分析一些结果。

Lastly, we can educate the wider community and general public on how data is used, what can be done with it, how it can affect them, and also let them know transparently when they are being evaluated by an AI model.

最后,我们可以教育更广泛的社区和公众如何使用数据,如何使用数据,如何影响数据,以及在通过AI模型评估数据时让他们透明地知道。

结论和其他资源 (Conclusion and additional Resources)

That is it! As always, I hope you enjoyed the post, and that I managed to help you understand a little bit more about bias in AI, its causes, effects, and how we can fight against it.

这就对了! 和往常一样,希望您喜欢这篇文章 ,并且我设法帮助您更多地了解AI的偏见,其成因,影响以及我们如何与之抗衡。

Here you can find some additional resources in case you want to learn more about the topic:

如果您想进一步了解该主题,可以在这里找到一些其他资源:

If you liked this post then feel free to follow me on Twitter at @jaimezorno. Also, you can take a look at my other posts on Data Science and Machine Learning here. Have a good read!

如果您喜欢这篇文章,请随时 通过@jaimezorno Twitter上 关注我 另外,您可以在 此处 查看我在数据科学和机器学习方面的其他帖子 祝您阅读愉快!

If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium, and stay tuned for my next posts! Also, you can check out this repository for more resources on Machine Learning and AI!

如果您想了解有关机器学习和人工智能的更多信息,请 在Medium上关注我 ,并继续关注我的下一篇文章! 另外,您可以在 此存储库中 查看有关机器学习和AI的更多资源!

翻译自: https://towardsdatascience.com/bias-in-artificial-intelligence-a3239ce316c9

标记偏见

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值