Notes_Man2Programmer@Woman2Homemaker

最新推荐文章于 2021-07-14 19:31:43 发布

Yupei_Du

最新推荐文章于 2021-07-14 19:31:43 发布

阅读量195

点赞数

分类专栏： PaperDaily Embeddings

本文链接：https://blog.csdn.net/vinodyp/article/details/78526103

版权

PaperDaily 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

Embeddings

1 篇文章 0 订阅

订阅专栏

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

introduction

Bias especially gender stereotypes in word embeddings:

e.g. Man - woman = programmer - homemaker

Pretrained embeddings: word2vec / 300dimensions / Google News

Quantify bias:

Compare a word vector to the vectors of a pair of gender-speciﬁc words. for example, nurse close to woman is not bias itself, because nurse close to humans, but closer than man suggest bias.

consider the distinction between gender speciﬁc words that are associated with a gender by deﬁnition (e.g. brother / sister), which close to a specfic gender is not bias, and the remaining gender neutral words (e.g. programmer / nurse).

We will use the gender speciﬁc words to learn a gender subspace ( Surprisingly, there exists a low dimensional subspace in the embedding that captures much of the gender bias.) in the embedding. Removes the bias only from the gender neutral words while respecting gender speciﬁc words.

Gender biases in English

Implicit Association Tests have uncovered gender-word biases that people do not self-report and may not even be aware of. Biases are shown in morphology as well as while there are more words referring to males, there are many more words that sexualize females than males.

Biases in algorithms

A number of online systems have been shown to exhibit various biases.Schmidt identiﬁed the bias present in word embeddings and proposed debiasing by entirely removing multiple gender dimensions. His approach is entirely remove gender from embeddings. At the same time, the difﬁculty of evaluating embedding quality (as compared to supervised learning) parallels the difﬁculty of deﬁning bias in an embedding.

word embeddings

Embeddings form: $w\epsilon{R^d}$ ,||w||=1. Assume F-M pair $\large{P\epsilon}{R^d*R^d}$ , gender neutral word $\large{N \epsilon }{W}$ , similiarity is cosine similarity:

c o s (u, v) = u * v | u | * | v |

$cos(u,v)={\frac{u*v}{|u|*|v|}}$
so similarity between embeddings is

c o s (w 1, w 2) = w 1 * w 2 (2)

$cos(w_1,w_2)=w_1 * w_2(2)$

Crowd experiments

Geometry of Gender and Bias in Word Embeddings

understand biases present in embeddings(i.e which words more close to he/she etc.) and to which extent biases agree with human notion of stereotypes.

Occupational stereotypes

Ask the crowdworkers to evaluate whether an occupation is con-sidered female-stereotypic, male-stereotypic, or neutral. Spearman r=.51(strongly correlated):

the geometric biases of embedding vectors is aligned with crowd judgment.

Analogies exhibiting stereotypes

(To Be Continued…)