生活中的观察者偏见例子_数据和人工智能中的性别偏见和代表性

本文探讨了观察者偏见如何在现实生活中体现,并延伸到数据和人工智能领域,揭示了性别偏见如何潜藏在这些技术中。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

生活中的观察者偏见例子

In light of the #MeToo movement and with the growing push for transparency on equal pay and equal opportunity, the world of Tech has had to come to terms with its concerning lack of female representation. It is no secret that women make up a woeful proportion of employees in the tech workforce. Statista estimates the figure to be less than 1 in 4 actual tech roles are taken up by females. The figures are equally as bad, if not worse, for Data Science. According to a report by Harnham, in America, only 18% of data scientists are women and 11% of data teams have no women at all. However, the lack of gender representation (along with other forms of representation such as racial representation), specifically in Data and AI, has ramifications outside of the workplace. It can inhibit the pursuit of gender equality in society. Data and AI are doubly impacted by a lack of female representation due to the gender data gap that exists in much of our data, as highlighted by Caroline Criado-Perez. Representation of females in the data and in the design of data solutions should be essential in any business process.

鉴于#MeToo运动的发展以及在同工同酬和机会平等方面不断提高透明度的要求,高科技界不得不考虑缺乏女性代表性。 在技​​术劳动力中,女性占雇员的比例是绝大部分,这已不是什么秘密。 Statista估计,这个数字少于女性在4个实际技术职位中所占的比例。 对于数据科学来说,这些数字同样糟糕,甚至更糟。 根据Harnham的报告,在美国, 只有18%的数据科学家是女性,而11%的数据团队根本没有女性 。 但是,缺乏性别代表制(以及其他形式的代表制,例如种族代表制),特别是在数据和人工智能中,在工作场所之外产生了影响。 它会抑制社会对性别平等的追求。 Caroline Criado-Perez强调说,由于我们的许多数据中都存在性别数据差异,数据和人工智能受到缺乏女性代表的双重影响。 在任何业务流程中,女性在数据和数据解决方案设计中的代表性都至关重要。

数据偏差 (Bias in Data)

All Data Science and AI projects will start with data. The fairness of an AI model can be limited by the fairness of the data you feed it. Unfortunately, too often, the data is reflective of the bias that exists in our society. This bias can appear in multiple forms. One form of bias will be the lack of females represented in the data. Another form of bias is where the data has not been appropriately sex dis-aggregated. That is, females are assumed to follow the same distribution and patterns as men.

所有数据科学和AI项目都将从数据开始。 AI模型的公平性可能受到您提供给它的数据的公平性的限制。 不幸的是, 数据经常反映出我们社会中存在的偏见 。 这种偏见可能以多种形式出现。 偏见的一种形式是数据中缺少女性。 偏见的另一种形式是未按性别对数据进行适当的分类。 也就是说,假定女性遵循与男性相同的分布和方式。

One such area of research where this issue is particularly troublesome, is in medical research. Prior to 1993, when the FDA and NIH mandated the inclusion of women in clinical trials, many medical trials featured no women due to their childbearing potential and complications in retrieving consistent data due to monthly hormonal fluctuations. A 2010 study found that single-sex studies of male mammals in the field of neuroscience outnumbered those of females 5.5 to 1. As a result, in order to account for women in medical trials, results from the male dominated trials are often extrapolated and women are simply treated as scaled-down men, as explained by Caroline Criado-Perez, author of Invisible Women. This evidently has profound impacts on the health of women. Women are more than 50% more likely to receive an incorrect diagnosis when they are having a heart attack.

医学研究是该问题特别麻烦的此类研究领域之一。 在1993年之前,当FDA和NIH要求将女性纳入临床试验时, 由于其生育能力以及由于每月荷尔蒙波动而导致获取一致数据的并发症 ,许多医学试验均没有女性参与2010年的一项研究发现,在神经科学领域,对雄性哺乳动物的单性研究数量超过雌性,为5.5:1。结果,为了在医学试验中考虑女性,通常推断男性主导试验的结果,而女性正如 《无形的女人》(Invisible Women)的作者Caroline Criado-Perez所解释的那样,女性只是被当作精简的男性对待 。 显然,这对妇女的健康产生了深远的影响。 女性心脏病发作的可能性要高出50%以上。

实施机器学习或AI模型时如何扩大偏见 (How bias can be amplified when implementing a Machine Learning or AI model)

Bias in data alone is bad enough, as it can portray incorrect distributions of observed behaviours. However, if you train an AI model on this data, if not done correctly, the model can learn these biased observations and further exacerbate them.

单靠数据的偏差就已经足够糟糕,因为它可以描绘观察到的行为的不正确分布。 但是,如果您在此数据上训练AI模型(如果未正确完成),则该模型可以学习这些有偏见的观察结果,并进一步加剧它们。

A study assessing digital biomarkers (physiological, psychological and behavioural indicators) for Parkinson’s disease featured only 18.6% women. Even if you correctly account for gender bias, you are more likely to produce more accurate diagnoses for men than for women due to the larger sample size. However, in the worst case scenario, if you don’t account for gender at all, you could be mis-diagnosing women completely if they exhibit different symptoms to men. Davide Cirillo et al. published an article examining the prevalence of gender bias in AI for healthcare, where it is also suggested that Precision Medicine (as opposed to a one-size-fits-all approach) should be applied.

一项评估帕金森氏病数字生物标志物 (生理,心理和行为指标)的研究仅针对18.6%的女性。 即使您正确地解释了性别偏见,由于样本量较大,男性比女性更有可能做出更准确的诊断。 但是,在最坏的情况下,如果您根本不考虑性别,则如果女性表现出与男性不同的症状,则可能会完全误诊女性。 Davide Cirillo等 。 发表了一篇文章,探讨了医疗保健人工智能中性别偏见的普遍性,并建议应采用精密医学(与“一刀切”的方法相对)。

Image for post

Another such AI model that amplifies gender biases exists in the field of natural language processing (NLP). Let’s say you blindly train a language translation model and ask it to translate the word ‘doctor’ from English into French. Due to historical biases that exist, the model will translate the word into the masculine version, as opposed to the feminine version. This is precisely what happened with google translate.

在自然语言处理(NLP)领域中存在另一种会放大性别偏见的AI模型。 假设您盲目地训练了一种语言翻译模型,并要求其将“医生”一词从英语翻译为法语。 由于存在历史偏见,该模型会将单词转换为男性版本,而不是女性版本。 这正是Google翻译所发生的事情。

The NLP technique of word embeddings is also not immune from gender bias. A word embedding represents each word in a piece of text as a vector. A word-embedding model is trained on co-occurrences of words. For example, two words that may have a similar meaning, would most likely be located close to each other in the vector space. Furthermore, the distance between such vectors can represent the relationship between words. One such illustrative example given in the paper is:

NLP词嵌入技术也无法避免性别偏见。 单词嵌入将每个单词中的每个单词表示为矢量。 在单词共现方面训练单词嵌入模型。 例如,可能具有相似含义的两个单词很可能在向量空间中彼此靠近。 此外,这些向量之间的距离可以表示单词之间的关系。 本文给出的一个示例性示例是:

Man — women ≈ king — queen.

男人-女人≈国王-女王。

This is innocent enough. The difference between man and women is similar to the difference between king and queen. However, it is also the case that

这是无辜的。 男人和女人之间的区别类似于国王和王后之间的区别。 但是,情况也是如此

Man — women ≈ computer programmer — homemaker.

男人-女人≈计算机程序员-家庭主妇。

Gender bias is also very present in Google search engines. Women are significantly less likely to be shown online ads for highly paid jobs. 1,000 users were simulated, half male and half female. Male users were shown adverts for jobs paying over $200,000 1,800 times, whereas women were shown those adverts only 300 times.

性别偏见在Google搜索引擎中也很常见 。 女性在高薪工作中不太可能显示在线广告。 模拟了1,000位用户,其中一半为男性,一半为女性。 男性用户的广告收入超过200,000美元,达到1,800次,而女性用户的广告仅为300次。

Algorithm bias can also directly exacerbate the lack of gender representation in tech. An amazon algorithm that was used as a hiring tool penalized women. The algorithm had been trained on historical résumés, which were of course mostly male. The algorithm therefore assumed that male candidates were preferable. Even if the gender was explicitly removed from the résumé, there are still features in the résumé that indicate gender, such as being a member of the ‘women’s chess team’, attending an all-female college, or even the language that is used.

算法偏差还会直接加剧技术领域缺乏性别代表性的情况。 一种用作聘用工具的亚马逊算法对女性造成了惩罚。 该算法已经过历史简历的培训,这些简历当然大多是男性。 因此,该算法假定男性候选人更为可取。 即使已从简历中明确删除了性别,简历中仍然存在表明性别的功能,例如成为“女子棋队”的一员,参加一所女子大学,甚至使用的语言。

Examples like these can further cement the societal gender bias and stereotype that women are not meant to work in tech, or other STEM occupations, making it harder overcome the current lack of female representation.

此类示例可以进一步巩固社会性别偏见和陈规定型观念,即女性无意从事科技或其他STEM职业,这使得克服目前缺乏女性代表性的工作变得更加困难。

女性可以带给团队的见解 (The insight women can bring to a team)

Many of the issues encountered in the sections above, could have been avoided if more women were present in research and data roles. Kate Crawford, a principal researcher at Microsoft, said that

如果在研究和数据工作中有更多的妇女,则可以避免上面各节中遇到的许多问题。 微软首席研究员凯特·克劳福德(Kate Crawford)说

“Like all technologies before it, artificial intelligence will reflect the values of its creators. So inclusivity matters — from who designs it to who sits on the company boards and which ethical perspectives are included. Otherwise, we risk constructing machine intelligence that mirrors a narrow and privileged vision of society, with its old, familiar biases and stereotypes.”

像之前的所有技术一样,人工智能将反映出其创造者的价值。 因此,包容性很重要-从谁进行设计到谁在公司董事会中任职,以及包括哪些道德观点。 否则,我们冒着构建机器智能的风险,这种机器智能会以其古老而熟悉的偏见和成见来反映社会的狭and特权特权。”

Women are aware that we are not just a scaled down version of men. A team that is entirely male, would be less likely to notice the omission of women in their data and would also be less likely to be concerned with the issues that face the other sex. Historically, men have been viewed as the default gender. Introducing more females into data science teams can help combat this.

女人意识到我们不仅仅是男人的精简版。 一个完全由男性组成的团队不太可能在数据中注意到女性的遗漏,也不太可能关注其他性别面临的问题。 从历史上看,男人被视为默认性别。 将更多女性引入数据科学团队可以帮助解决这一问题。

A particular example of the consequences of this disparity in the world of tech, is how Apple iPhones have been designed to fit in the size of a male hand, rather than a smaller female hand. With more women on the team, this would have been much more apparent. A more severe consequence of the lack of consideration of females, is that in the design of airbags. Women are 47% more likely to be seriously injured in a crash. Women are again treated as down-scaled men. Car crash test dummies are built to represent the average male. This is not that much of a surprise when you consider the all-male airbag design team. However, even physically speaking, women are not down-scaled version of men. When thinking of an air bag expanding into my chest, it does not take me long to realise that the existence of breasts may affect their efficacy!

造成这种差距的一个特殊例子是,如何设计Apple iPhone以使其适合男性手而不是较小女性手的大小。 如果团队中有更多女性,这将更加明显。 缺乏女性意识的更严重后果是安全气囊的设计。 妇女在车祸中受重伤的可能性增加47% 。 妇女再次被视为规模缩小的男人。 撞车测试假人被用来代表普通男性。 当您考虑使用全男性安全气囊设计团队时,这并不令人感到意外。 但是,即使从身体上来讲,女性也不是男性的缩小版。 当想到气囊膨胀到我的胸部时,我很快就意识到乳房的存在可能会影响其功效!

Issues that affect women more than men are also not considered. For example, in the design of a virtual reality game by an all male team, the issue of sexual harassment, something that most women have to deal on a weekly basis, was not considered. When this game was sent out to a female gamer for review, another (male) gamer proceeded to sexually harass her in the virtual world. Credit should be given to the team who immediately responded and resolved the issue. However, if there were a woman on the team, it would be far more likely that sexual harassment would have been brought up in the design process, given how it is such a regular occurrence in our lives.

还没有考虑到对妇女的影响大于对男人的影响。 例如,在一个全男性团队设计的虚拟现实游戏中,没有考虑到性骚扰问题,而大多数女性每周都要处理一次。 当该游戏发送给女性游戏者进行审查时,另一位(男性)游戏者开始在虚拟世界中对她进行性骚扰。 应该赞扬立即做出React并解决问题的团队。 但是,如果团队中有一名女性,考虑到我们生活中经常发生的性骚扰,则很可能在设计过程中引起性骚扰。

拥有更具性别差异的团队的好处 (The benefit of having a more gender diverse team)

This point is not unique to the fields of tech and data. Roughly half of the global population is female. If females are not considered in the design of any product or solution, then you are missing out half of your potential audience. If the needs and preferences of women are not acknowledged, women may be less likely to buy your product or use your solution, such as a virtual reality game or a smart phone, which would clearly have a negative impact on sales. But, more critically, the lack of consideration or optimization for gender could further inhibit the lives of women relative to men, as demonstrated in AI for healthcare, the design of airbags or the implementation of hiring algorithms.

这一点并不是技术和数据领域所独有的。 全球大约一半的人口是女性。 如果在任何产品或解决方案的设计中都没有考虑女性,那么您将失去一半的潜在受众。 如果不承认女性的需求和偏好,女性可能不太可能购买您的产品或使用您的解决方案,例如虚拟现实游戏或智能手机,这显然会对销售产生负面影响。 但是,更关键的是,如针对医疗保健的AI,安全气囊的设计或聘用算法的实现所证明的,缺乏对性别的考虑或对性别的优化可能会进一步抑制女性相对于男性的生活。

Hiring more females into data and AI roles is not just the right thing to do, but it can also benefit your business and improve the fairness of your algorithms. According to Gartner Inc.,

聘请更多女性担任数据和AI角色不仅是正确的事情,而且还可以使您的业务受益并提高算法的公平性。 根据Gartner Inc.的说法,

“By 2022, 85% of AI projects will deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them. This is not just a problem for gender inequality — it also undermines the usefulness of AI”.

“到2022年,由于数据,算法或负责管理它们的团队的偏见,有85%的AI项目将交付错误的结果。 这不仅是性别不平等的问题,还破坏了人工智能的作用。”

I would also like to add that gender representation is not the only important form of representation. Many of the examples I gave in this post about a lack of female representation can be repeated for a lack of racial or socio-economic representation.

我还要补充一点,性别代表不是唯一重要的代表形式。 我在这篇文章中给出的许多关于缺乏女性代表性的例子都可以由于缺乏种族或社会经济代表性而重复出现。

Hiring a diverse team is not just about filling quotas, but it can also introduce more perspectives and an improved decision making process.

聘请多元化的团队不仅要填补配额,还可以引入更多观点并改善决策过程。

In order to correct for gender bias in Data and AI, the approach should be multi-pronged.

为了纠正数据和人工智能中的性别偏见,该方法应多管齐下。

  • Care should be taken to de-bias data.

    应注意消除数据偏差。
  • Algorithms should be transparent and tested for bias.

    算法应该透明并测试偏差。
  • Companies with Data and AI teams should make more of an effort to ensure that their teams are sufficiently diverse.

    拥有数据和AI团队的公司应做出更多努力,以确保其团队足够多样化。

If you are interested in learning more about the issues of societal bias in data and algorithms, while supporting the voices of women in the world of Maths, Data and Tech, I would recommend reading:

如果您想更多地了解数据和算法中的社会偏见问题,同时支持数学,数据和技术领域的女性声音,我建议您阅读:

  • Invisible Women: Exposing Data Bias in a World Designed for Men by Caroline Criado-Perez

    看不见的女人:Caroline Criado-Perez为男人设计的世界中暴露数据偏见
  • Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neill

    数学毁灭性武器:大数据如何增加不平等并危及民主作者:凯茜·奥尼尔(Cathy O'Neill)
  • Hello World: Being Human in the Age of the Machine by Hannah Fry

    你好,世界:在机器时代成为人类​​作者:汉娜·弗莱(Hannah Fry)

翻译自: https://medium.com/women-in-data-ai-uk/gender-bias-and-representation-in-data-and-ai-177b9f0da1e3

生活中的观察者偏见例子

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值