As a young data journalist, I was advised to attend NICAR — an annual data journalism conference organized by Investigative Editors and Reporters and their suborganization, the National Institute for Computer-Assisted Reporting. In researching the conference, I stumbled upon recordings of the 2019 NICAR Lightning Talks, which are five-minute presentations related to data journalism chosen by a popular vote. Last year, Alex Garcia gave a talk called 5 ways to write racist code (with examples). I was able to chat with him last week about his talk, the response he received, and how he’s feeling about it a year later.
作为一名年轻的数据记者,我被建议参加NICAR,这是由 调查编辑和记者 及其下属机构,美国 国家计算机辅助报告协会 组织的年度数据新闻会议 。 在研究会议时,我偶然发现了 2019 NICAR Lightning Talks的 录音,该录音 是五分钟的演讲,与公众投票选择的数据新闻有关。 去年, 亚历克斯·加西亚 ( Alex Garcia) 发表了名为 5种编写种族主义代码的方法的演讲 (带有示例) 。 上周,我能够与他聊天,讨论他的谈话,他收到的回复以及一年后他对此的感觉。
Emilia Ruzicka: Thank you so much for agreeing to meet with me! Can we start with an introduction?
艾米莉亚·鲁基卡(Emilia Ruzicka):非常感谢您同意与我见面! 我们可以从介绍开始吗?
Alex Garcia: Sure! My name is Alex. I recently graduated from University of California, San Diego (UCSD) with a major in computer engineering. I’m from Los Angeles, went to school down in San Diego. I’ve always been interested in computers and when I started at UCSD I decided, “Oh, computer engineering might be something kind of cool.” The first time I ever programmed or did anything in this field was when I started out in college.
亚历克斯·加西亚:当然可以! 我的名字叫亚历克斯。 我最近从加利福尼亚大学圣地亚哥分校(UCSD)毕业,主修计算机工程。 我来自洛杉矶,在圣地亚哥上学。 我一直对计算机感兴趣,当我开始UCSD时,我决定:“哦,计算机工程可能会很酷。” 我第一次编程或在该领域做过任何事情都是在我刚大学毕业时。
I didn’t know about the data journalism field until about a year and a half, two years ago, and I found out through Reddit, Data is Beautiful, and I found all these New York Times articles and whatever else, so that’s how I got into it. I didn’t know too much about the actual field and NICAR until I saw someone randomly tweet about it. I saw it was going to be in Newport Beach and I was like, “Oh, that’s really cool!” In terms of my actual experience in journalism, I honestly have none. There’s student newspapers on campus and all that, but I never really got into that, never knew it was available. I did do a little bit of data stuff, but I just really didn’t know much about it.
直到大约两年半前,我才知道数据新闻领域,并且我通过Reddit发现了“ Data is Beautiful”,并且找到了所有这些《纽约时报》的文章以及其他内容,所以我就是这样进入它。 在我看到有人随机发推文之前,我对实际领域和NICAR并不了解太多。 我看到那将是在纽波特海滩,我当时想:“哦,那真的很酷!” 根据我在新闻界的实际经验,老实说我没有。 校园里有很多学生报纸,但我从来没有真正涉足过,从来不知道有没有这种报纸。 我确实做了一些数据工作,但是我真的对此并不了解。
So during NICAR I met a lot of really cool people, saw what the field was like, got really interested in it. I met someone who goes to UCSD and is interested in journalism. We were actually roommates for this past quarter, which was really cool. Right now, I just graduated in December. I have a couple of months off where I’m not doing too much. I’m going to start a new job at the end of March doing general software engineering stuff. In the future, I hope to get into some sort of newsroom, some kind of data journalism, later down the road.
因此,在NICAR期间,我遇到了很多非常酷的人,看到了这个领域是什么样的,并对它产生了真正的兴趣。 我遇到了一个去加州大学圣地亚哥分校并对新闻感兴趣的人。 在过去的这个季度中,我们实际上是室友,这真的很酷。 现在,我刚在12月毕业。 我有两个月的休假时间,我的工作量不大。 我打算在三月底开始一份新工作,从事一般的软件工程工作。 将来,我希望以后能进入某种新闻编辑室,某种数据新闻学。
ER: That’s a really interesting journey, where you started not knowing, entered computer science, and then by association and serendipity found data journalism. Speaking of, last year, you gave a lightning talk at NICAR. Could you talk about your topic?
ER:那是一个非常有趣的旅程,您开始不知道,进入计算机科学,然后通过联想和偶然发现了数据新闻。 说到去年,您在NICAR做了一次闪电演讲。 你能谈谈你的话题吗?
AG: Yeah, so a little bit of background about that. It was specifically about racial bias in algorithms and racial bias in code. This is a field that at the time I was somewhat interested in because I’d see a tweet or an article here and there that someone wrote. I had friends from different fields who were taking classes and they’d say, “Hey, this is a cool article, why don’t you read it?” and it would be about courtroom justice and how these algorithms would determine whatever. So I was always tangentially interested in it. I always had the idea in the back of my mind that I should just aggregate all these links or stories that I find and have it in one list that people can go to and find. But I never did that because I just never got around to it.
AG:是的,所以有一点背景知识。 特别是关于算法中的种族偏见和代码中的种族偏见。 这个领域当时让我有些兴趣,因为我会在这里和那里看到有人写的推文或文章。 我有来自不同领域的朋友在上课,他们会说:“嘿,这是一篇很酷的文章,为什么不阅读呢?” 这将涉及法庭司法以及这些算法将如何决定任何事情。 所以我一直对它感兴趣。 我总是有个主意,就是应该汇总我找到的所有这些链接或故事,并将它们放在一个可供人们访问的列表中。 但是我从来没有那样做,因为我从来没有绕过它。
So when I signed up for the conference and saw they had these lightning talks where you can do a few minute speech about whatever you want, having that idea in my mind, I thought I could either aggregate this list or do this talk. I was specifically excited to do a talk to journalists, too, because I don’t know how many reporters really know about this field. They may know tangentially — kind of like my knowledge of college sports and how students can get paid for playing; I know something about that field, but I don’t know much — so I thought it was the same in this case, where people may have heard stories about courtroom injustice or some Microsoft twitter bot that went crazy because people took it over, but they may not know the differences between what leads to those things. I thought if I aggregate all these things and show how diverse this field is, how these different problems arise, and what fields they appear in, it might be something nice to share.
因此,当我报名参加会议并看到他们进行了闪电般的演讲时,您可以就自己想要的内容进行一些简短的演讲,在我脑海中有了这个想法,我想我可以汇总此列表或进行此演讲。 我也特别高兴与记者进行座谈,因为我不知道有多少记者真正了解这一领域。 他们可能相切地知道-有点像我对大学体育的知识以及学生如何从比赛中获得报酬; 我对该领域有所了解,但我所了解的不多,因此在这种情况下,我以为是相同的,那里的人们可能听说过关于法庭上的不公正行为或某些Microsoft Twitter机器人发疯的故事,因为人们接管了它,但是他们可能不知道导致这些事情的原因之间的区别。 我以为如果汇总所有这些内容并显示该领域的多样性,这些不同问题的产生方式以及它们出现在哪些领域,那么分享这些内容可能会很不错。
I had a bunch of bookmarks to all these different stories I had, cobbled them together, threw a pitch in, and it was a lot of fun aggregating! I’m not the best public speaker and I’m not the best organizer for all these thoughts, so the night before I was frantically working on the slides. I had a lot of ideas about what I wanted to put in the talk, but since it’s only five minutes, I had to cut things out, cut things short, and move things around. But it was fun! It was definitely nerve-wracking, especially because I knew no one in the audience besides two or three people I had met during the days leading up to it.
我有一堆书签来记录我所经历的所有这些不同的故事,将它们拼凑在一起,投入进来,聚集在一起真是太有趣了! 我不是所有这些想法的最佳公开演讲者,也不是最好的组织者,所以前一天晚上,我正在疯狂地研究幻灯片。 我对要讲的内容有很多想法,但是由于只有五分钟,所以我不得不删节,删节并四处移动。 但这很有趣! 这绝对是令人不安的事情,尤其是因为在接下来的几天里,除了我遇到的两三个人之外,我听众中没有其他人。
ER: You touched on this a little bit, but what inspired your talk? Was there any particular article that you encountered that made you think you needed to do your talk on racial bias in code or was it more of the conglomerate idea that sparked it?
ER:您稍微提到了一点,但是是什么启发了您的演讲? 您是否遇到过某篇特定的文章,使您认为您需要就代码中的种族偏见进行讨论,还是更多是由集体想法引起的?
AG: That’s a great question. I think for general inspiration of the talk, it was just a bunch of different links that I saw and stories that I would find. Also, the general — not ignorance, per se — but how people don’t know that this is a problem or that it could exist. One of the things that I don’t think I mentioned in the talk specifically, but one of the links that I had was a Reddit thread about gerrymandering. There was some news article talking about gerrymandering and one of the top comments was, “Oh, this research team or this company is working on an algorithm that could do it automatically. They give it whatever and then the computer will do it, so there will be no bias at all.” A couple of comments after that they were saying, “Why are humans doing this? Computers could do it and it would have no bias.” And somewhere hidden in there, there was one comment saying, “Hey, that’s not really how that works. A computer could do it and it could still be biased and there’s many different ways that could come across.” So I think that thread, in particular, stuck out to me. I’ve seen similar threads since then, whether it’s just random regulatory items or other random stuff where people will say that if a computer could do something, it would be a lot easier or more fair.
AG:这是一个很好的问题。 我认为,对于这次演讲的总体启发,这只是我看到的许多链接以及可以找到的故事。 另外,一般性的本身不是无知,而是人们如何不知道这是一个问题或可能存在。 我认为我在谈话中没有特别提到的一件事,但是我拥有的链接之一是关于gerrymandering的Reddit主题 。 有一些新闻报道了关于送礼的问题,最热门的评论之一是:“哦,这个研究团队或这家公司正在研究一种可以自动执行的算法。 他们给它任何东西,然后计算机会做它,所以根本不会有偏差。” 他们说了几句话后说:“为什么人类要这样做? 计算机可以做到这一点,而且不会产生任何偏见。” 在其中某个隐藏的地方,有一条评论说:“嘿,那不是真的。 一台计算机可以做到这一点,但它仍然可能会带有偏见,并且可能会遇到许多不同的方式。” 因此,我认为该线程尤其对我很重要。 从那以后,我见过类似的话题,无论是随机监管项目还是其他随机项目,人们都会说,如果一台计算机可以做某事,它将变得更加容易或更公平。
There would also be other general conversations I would have with friends, not necessarily talking about whether it would be fair for computers to do something, but more about the actual impacts that these issues might have on people. I think there was also a tweet from Alexandria Ocasio-Cortez. She said something about how algorithms have bias and algorithms could be racist. And then there was a reporter from Daily Wire saying that code can’t be racist. So it’s just a lot of nit-picky things where I don’t know if people really understand this, how it works, and how it manifests.
我还将与朋友进行其他一般性对话,不一定要讨论计算机做某事是否公平,而是要讨论这些问题可能对人们产生的实际影响。 我认为Alexandria Ocasio-Cortez也发了一条推文。 她说了一些有关算法如何产生偏见以及算法可能是种族主义的东西。 然后有来自Daily Wire的记者说,代码不可能是种族主义者。 因此,这只是很多挑剔的事情,我不知道人们是否真的了解它,它如何工作以及如何表现出来。
Also, about a year before the talk, I took a small seminar in computer science education and the professor at UCSD was really interested in K-12 computer science education. Part of the stuff she would talk about and that I learned more about in future classes was the importance of knowing the fundamentals of computer science or programming. Not necessarily knowing how to program or whatever, but knowing how it works, the way it works, and what it can or can’t do. Think about the general US population and how many actually know, not how the computers work, but what their limits are. That’s a field that drives this conversation. You know, if people are ignorant or they don’t know that these computers are not unbiased, that can be a problem.
另外,在演讲之前大约一年,我参加了一个小型计算机科学教育研讨会,而UCSD的教授对K-12计算机科学教育非常感兴趣。 她会谈论的部分内容以及我在以后的课程中会学到的更多信息是了解计算机科学或编程基础知识的重要性。 不一定知道如何编程或执行任何操作,而是知道其工作方式,工作方式以及可以做什么或不能做什么。 考虑一下美国的总人口以及实际上有多少人,不是计算机的工作原理,而是计算机的极限。 那是推动对话的领域。 您知道,如果人们一无所知,或者他们不知道这些计算机并非没有偏见,那可能是个问题。
ER: You mentioned briefly how important you felt it was to present this to an audience of journalists. Could you talk more about that and any sort of considerations you made when you were giving your talk, knowing that your audience was journalists and the ethics that are inherently assumed when journalists present information?
ER:您简短地提到了将其呈现给记者听众的重要性。 您能否在谈论演讲时进一步了解这一点以及您所考虑的各种问题,同时知道您的听众是新闻工作者,以及新闻工作者发表信息时固有的道德观念?
AG: I remember one thing I was thinking about while I was making the presentation and noticed specifically at NICAR was that most of the journalists there are journalists first. They learned how to code while working on stories or doing their job. There are some people who are half-journalist and half-engineer and they know more about coding, but most of the audience seemed to be the kind of people who would take a Python or R workshop to learn about them for the first time. So I didn’t want to have anything that was too technical or show too much code. One thing I did to counteract that was to use a lot of headlines or stories by reporters who were in the field who know more about it and would be familiar to the audience. And while I did show some code, I made sure it wouldn’t be too complicated and would be easy to explain.
AG:我记得我在做演讲时想的一件事,尤其是在NICAR上注意到的是,大多数记者首先是记者。 他们学习了如何在编写故事或完成工作时进行编码。 有些人是半职业记者和半工程师,他们对编码了解更多,但是大多数观众似乎是那种会第一次参加Python或R研讨会的人。 所以我不想拥有任何过于技术性或显示过多代码的东西。 我要抵消的一件事是,使用本领域的记者的许多标题或故事,他们对此有更多了解,并且会被听众所熟悉。 虽然我确实显示了一些代码,但我确保它不会太复杂并且很容易解释。
One of the points was about doing sentiment analysis and how if you use the wrong model and pass in a string like, “I like Italian food” you would have a higher sentiment than if you used “I like Mexican food.” So if I did show code, it was very simplified and probably something that people were somewhat used to.
重点之一是进行情感分析,以及如果使用错误的模型并传递“我喜欢意大利菜”之类的字符串,与使用“我喜欢墨西哥菜”相比,您的情感要更高。 因此,如果我确实显示了代码,它会非常简化,并且可能是人们已经习惯了的东西。
For the ethical implications, I’m not sure. I did my best to have sources or links that people could go to and follow, but what I didn’t talk about was how you can report on this or how you can find different agencies that may be meddling in this, mostly because I don’t know how to do that. I don’t have a journalism background, so I don’t know how you find sources or what’s the best, most ethical way to go about doing that. I kind of avoided doing that and said: “Here are some stories and headlines that all have something to do with each other and some reasons behind how one event led to another.”
对于道德影响,我不确定。 我尽力提供人们可以访问并遵循的资源或链接,但是我没有谈论的是如何报告此问题或如何找到可能干预此事的不同机构,主要是因为我不不知道该怎么做。 我没有新闻学背景,所以我不知道您如何找到消息来源,或者什么是最好的,最符合道德的方式来做到这一点。 我有点避免这样做,并说:“这里有一些故事和头条新闻彼此相关,以及某些事件导致另一事件的背后原因。”
ER: After you gave this talk, what was the response? Were people really interested and want to learn more? If so, has that response continued or have you seen a continued trend in the media in reports on stories like the ones you used in your talk?
ER:您讲完这个演讲后,您的回应是什么? 人们真的有兴趣并想了解更多吗? 如果是这样,这种回应是持续的还是您在媒体上看到的关于您在演讲中使用的故事的趋势持续不断?
AG: Right after the talk, I would get random Twitter DMs here and there from journalists saying, “this was really cool, I really liked it!” or “I had a small question about a source that you used.” One person wanted to talk about the realm in general — what companies are maybe more susceptible to this or that danger. Personally, it was a great way to meet people and see who is working in this field and who is interested in it.
AG:演讲结束后,我会在这里和那里从记者那里随机得到Twitter DM,他们说:“这真的很酷,我真的很喜欢!” 或“我对您使用的来源有一个小问题。” 一个人想大体上谈论这个领域-哪些公司可能更容易受到这种或那种危险的影响。 就个人而言,这是结识人们并了解谁在该领域工作以及谁对此感兴趣的好方法。
In terms of long term what I’ve seen in the media since my talk, I think I’ve seen the field get a little bit worse. There’s a company that The Washington Post did an article about where you send them videos of job interviews and the company uses AI to see if they’re a good candidate by analyzing speech and body patterns. And it’s so problematic because there are just so many things that can go wrong, but seeing the amount of money and velocity and power that they have is pretty scary. That’s probably the biggest thing I’ve seen since the talk. I’ve probably seen a couple of other headlines because there’s more and more of a focus on this, especially an academic focus, but I can’t think of any off the top of my head.
从演讲以来我在媒体上看到的长期情况来看,我认为我看到的领域有所恶化。 有一家公司在《华盛顿邮报》上发表过一篇文章,内容是关于您将求职视频发送给哪里的信息,该公司使用AI通过分析言语和身体形态来确定他们是否是一个不错的候选人。 这是有问题的,因为有太多事情可能出错,但是看到它们所拥有的金钱,速度和力量实在令人恐惧。 自从演讲以来,这可能是我见过的最大的事情。 我可能已经看到其他一些头条新闻,因为对此有越来越多的关注,尤其是学术方面的关注,但我想不起。
ER: You mentioned earlier that you had a lot of things you wanted to put in your talk, but because of time constraints you couldn’t. If you had the opportunity to give the talk again without a time limit, what are some things you would have mentioned, both from when you were preparing the talk and from current issues of racial bias in code and data?
ER:您之前提到过,您有很多想发表的内容,但是由于时间限制,您不能这样做。 如果您有机会在没有时间限制的情况下再次进行演讲,那么从准备演讲时以及当前在代码和数据中存在种族偏见的问题,您会提到些什么?
AG: I think for each of the five sections I had, there were one or two more articles I had, so I would have included those to make my points stronger. Also, I had this reach goal for the presentation when I wrote the slides to use a Javascript tool to make my slides a website. I wanted to run a machine learning algorithm during the presentation to show that you don’t need a big fancy server or computer to have the resources to make biased code. And at the end, I would be able to show that it was running on some NYPD stop and frisk data that I had and how biased the outcome could be with some pretty readily available tools and data. It’s not hard at all for this to happen.
AG:我认为在我的五个部分中,我都有另外一两篇文章,所以我会包括那些文章以使我的观点更强。 另外,当我编写幻灯片以使用Javascript工具将幻灯片制作为网站时,我达到了演示文稿的目标。 我想在演示过程中运行一种机器学习算法,以表明您不需要大型的服务器或计算机即可拥有制作有偏见的代码的资源。 最后,我将能够证明它运行在我拥有的某些NYPD止损和折返数据上,以及使用一些非常容易获得的工具和数据对结果的偏向。 做到这一点并不难。
I was trying to make it work, but the logistics weren’t working out and I didn’t want to cause too many difficulties, so I just went with regular slides instead, but I think having an example like that would drive the point home even further. Even the presentation you make for a talk has the power to make automatic, biased decisions for no good reason. I also would have liked to do demos of where things could go wrong, such as the sentiment analysis example I used, so that people could see exactly what was happening instead of just getting the theory. I think recreating that would reinforce my ideas.
我试图使它正常工作,但是后勤工作没能解决,我不想造成太多麻烦,所以我只换了常规的幻灯片,但我认为有这样的例子将使问题得以解决。更深入。 即使您进行演讲的演示文稿也可以无缘无故地自动做出有偏见的决定。 我还希望做一些演示,以指出可能出问题的地方,例如我使用的情感分析示例,以便人们可以准确了解正在发生的事情,而不仅仅是了解理论。 我认为重新创建将加强我的想法。
ER: Cool! Is there anything else you want to say about the importance of being aware of racial bias in code and data or how people can become more conscious and evaluative of what they’re consuming?
ER:酷! 关于要意识到代码和数据中的种族偏见的重要性,或者人们如何能够变得更加自觉和对所消费的物品进行评估,您还有什么要说的吗?
AG: I think one heuristic that can be helpful in noticing when these things happen is watching for when someone says, “Oh yeah, a computer did that” or “a computer made the decision” or even “oh, that can’t be biased because of X, Y, or Z.” That’s something I feel happens a lot from day to day where something happened “automatically,” but for me, that’s a red flag. Those are things to look into a little more and check out how the decision was actually built.
AG:我认为一种启发式方法可以帮助您注意到这些事情何时发生,是当有人说“哦,是的,一台计算机做到了”或“一台计算机做出了决定”,甚至是“哦,那不可能由于X,Y或Z而有偏差。” 这是我每天都会发生的很多事情,其中有些事情是“自动”发生的,但是对我来说,那是一个危险信号。 这些是需要进一步研究并检查实际制定决策的方法。
With data visualization specifically, when you’re making these visualizations, it’s only as sound as the data you’re building on top of. If the data has underlying problems, then no matter what you put on top of it, you’re just going to make it worse. For instance, electoral maps. If you look at election results by county for the entire United States, you’re in some ways supporting an older, racist, white supremacist system. The goal might not be to create a racist visualization, but in some ways, you’re biasing the view and integrity of that data.
专门使用数据可视化,当您进行这些可视化时,它的声音与在其上构建的数据一样好。 如果数据存在潜在的问题,那么无论您将其置于什么之上,都只会使数据变得更糟。 例如,选举地图。 如果您查看整个美国的县级选举结果,那么您在某种程度上支持一个较旧的种族主义白人至上主义制度。 目标可能不是创建种族主义的可视化,而是在某些方面使数据的视图和完整性带有偏见。
There are many other examples with data visualization and data analysis, but just knowing that whatever data you’re using, you’re sitting on top of a historical view of how it came to that point. I think that’s definitely something to consider as you work.
还有许多其他有关数据可视化和数据分析的示例,但是只要知道您使用的是什么数据,您就可以了解有关如何实现这一点的历史视图。 我认为这绝对是您工作时要考虑的事情。
You can listen to Alex Garcia’s full lightning talk via IRE Radio here.
数据新闻中的种族偏见
本文采访了数据记者Alex Garcia,探讨了他在NICAR会议上关于数据新闻中种族偏见的演讲。Garcia强调了算法和代码中的种族偏见问题,并讨论了数据可视化可能带来的潜在偏见。
948

被折叠的 条评论
为什么被折叠?



