统计数字问题_统计问题

统计数字问题

Statistics can be one of the most divisive and harmful misinformation tools, and I have seen it all over Facebook. I have attempted to make sense of the apparent conflict of reality that statistics represent. I knew nothing about data science when I began to write this, and after researching, I realized that I am woefully unprepared for this attempt. That said, I decided to give it a try. (Note: I know that I have a bias towards the existence and ubiquitousness of systemic racism. This article is a general critique, but on second reading, my examples betray that bias.)

统计信息可能是最具分裂性和危害性的错误信息工具之一,我在Facebook上都看到了它。 我试图弄清统计学所代表的明显的现实冲突。 当我开始写这篇文章时,我对数据科学一无所知,经过研究后,我意识到我对这一尝试毫无准备。 也就是说,我决定尝试一下。 (注意:我知道我对系统种族主义的存在和普遍存在偏见。这篇文章是一种普遍的批评,但是在二读时,我的例子背叛了这种偏见。)

推动力 (The Impetus)

Candace Owens claimed that 75% of black homes are without a father. She said that if you believe that there are strong male role models in Black America, you are entering “the land of the delusional.”

Candace Owens声称75%的黑人房屋没有父亲。 她说,如果您相信美国黑人有很强的男性榜样,那么您正在进入“妄想之地”。

That’s a damning statistic. Larry Elder recently tweeted: “Assume there’s a vaccine against white racism. Would 70% of black kids STILL be raised in fatherless homes?”

这真是一个令人毛骨悚然的统计数字。 拉里·埃尔德(Larry Elder)最近在推特上写道:“假设有一种针对白人种族主义的疫苗。 70%的黑人孩子仍会在无父之家长大吗?”

The problem is, this statistic is based on census data. The statistic is technically correct; 75% of black homes have unmarried parents. But that’s not the same as a fatherless home. You see, 45% of children are in black dwellings with unmarried parents that live in the same household. That statistic is a little less damning and on par with white figures. How about shared custody? That’s 25% of children living in that situation. In reality, the percentage of fatherless homes is higher in black America, but not to the extent presented to us. The statistics, while technically correct, are used to deceive us.

问题是,此统计数据基于普查数据。 该统计数据在技术上是正确的; 75%的黑人家庭有未婚父母。 但这与无父之家不同。 您会发现,有45%的孩子与未婚父母住在同一家庭的黑色房屋中。 该统计数字少一些,与白人数字相当。 共享监护权如何? 那是那种情况下生活的儿童的25%。 实际上,黑人美国的无父之家比例较高,但没有达到我们所能提供的范围。 这些统计数据虽然在技术上是正确的,但是却被用来欺骗我们。

Whether the statistic is police brutality, the number of out of state protestors, or job numbers, they can easily be used to support whatever side you are on. How? Because statistics are, by definition, nuanced and subject to interpretation. They NEVER speak for themselves. This is a feature, not a bug.

无论统计数字是警察的暴行,州外示威者的人数或工作人数,都可以轻松地使用它们来支持您所处的任何方面。 怎么样? 因为根据定义,统计数据是细微的,并且需要解释。 他们从不为自己说话。 这是一个功能,而不是错误。

书呆子数据 (The Nerdy Data)

Statistical Significance is the first presentation of data from a study that a researcher observed. They have to decide what it means and report it. A study may show a thing, but something being Statistically significant doesn’t necessarily mean it is feasible or of any practical importance.

统计意义是研究人员观察到的一项研究数据的首次呈现。 他们必须决定其含义并进行报告。 研究可能显示出某件事,但是具有统计学意义的某件事并不一定意味着它是可行的或任何实际意义。

A good example is the previously mentioned fatherless homes statistic. Both people referred to applied significance to the part of the statistic that backed up their narrative. In reality, that statistic had no Statistical Significance at all.

一个很好的例子是前面提到的无父房屋统计数据。 双方都提到了在统计数据中支持其叙述的部分的应用意义。 实际上,该统计数据根本没有统计意义。

Secondly, there are Irrelevant Plots. This often happens when the researcher enters into a study with a bias in need of confirmation. An irrelevant plot is similar to a large banana photo, which becomes a normal-sized banana when a quarter is introduced into the image for scale. We need a baseline to determine the relevance of a statistic. Unfortunately, benchmarks can be impossible to agree upon.

其次,有不相关的图 。 当研究人员带着需要确认的偏见进入研究时,通常会发生这种情况。 不相关的图类似于一张大的香蕉照片,当将四分之一的像素引入图像以进行缩放时,该照片将变成正常大小的香蕉。 我们需要基线来确定统计数据的相关性。 不幸的是,基准可能无法达成共识。

Let’s take use of non-lethal force amongst minorities. Is it on the rise or not? Well, depends, are we looking at the black population alone? Then no, it’s steady. Do we include “unknown race” and “other”? Then yes, it’s definitely on the rise. Many of the unknown race is black. Many are not. Jamaicans are often listed as “other” when they could quickly put in the “black” category. The baseline is so squishy that I could right now quote you statistics that support improvements in rates of non-lethal force, or make it appear to be rising egregiously. If I’m Larry Elder or Candace Owens, then it’s going to be improving. If I’m a Black Lives Matter representative, then it’s a call to arms. See how easily one statistic is manipulated for my cause?

让我们在少数民族中使用非致命武力。 是否在上升? 好吧,取决于,我们是否仅在研究黑人人口? 那不,它很稳定。 我们是否包括“未知种族”和“其他”? 然后是的,它肯定在上升。 许多未知种族是黑人。 许多不是。 牙买加人通常会很快被列为“黑色”类别,因此被列为“其他”。 基线如此柔弱,以至于我现在可以引用你们的统计数据来支持非致命力量比率的提高,或者使它看起来急剧上升。 如果我是Larry Elder或Candace Owens,那么它将会有所改善。 如果我是“黑人生活问题”的代表,那么这是个呼吁。 看看为我的原因操纵一个统计数据有多容易?

The third is Correlation Does Not Equal Causation. If you see a stat that says crime increases in areas where black people live, it’s easy to assume that the cause of the crime is the black population. Unfortunately, causation is the most challenging task a data scientist must decipher. Data alone can rarely do it. Determining causation often requires an in-depth investigation — boots on the ground. Interviews, historical research, control groups, and experiments are not the purview of most data scientists. Data is.

第三是“ 相关不等于因果关系” 。 如果您看到一个统计数字,说在黑人居住的地区犯罪增加,那么很容易就认为犯罪的原因是黑人。 不幸的是,因果关系是数据科学家必须解密的最具挑战性的任务。 单靠数据很少能做到。 确定因果关系通常需要进行深入调查-在地面上启动。 访谈,历史研究,对照组和实验并非大多数数据科学家的权限。 数据是。

What if crime is higher in a predominantly black neighborhood because that neighborhood has low-income housing? And what if there was a time that the neighborhood wasn’t predominantly black, and the crime was still higher in that area? What if the area became mostly black because the more impoverished minority community relocated there because of the affordable housing? If all of these are considered, the conclusion could easily be reported: “Black populations are not solving a historic crime problem they inherited in certain neighborhoods.” Much less damning, and much more ridiculous.

如果在一个主要为黑人的社区中犯罪率较高,因为该社区的低收入住房怎么办? 而且,如果有一段时间该社区并非以黑人为主,而该地区的犯罪率仍然更高? 如果由于贫穷的少数族裔社区由于负担得起的住房而搬迁到该地区,该地区变成了大部分黑人,该怎么办? 如果将所有这些因素考虑在内,则很容易得出结论:“黑人无法解决他们在某些社区中继承的历史犯罪问题。” 该死的少得多,而荒谬得多。

The fourth factor is the Yule-Simpson Effect. Defined, it’s when several groups of data suggest one thing, but the conclusion is reversed when combined. “In 1973. Admission rates were investigated at the University of Berkeley’s graduate schools. Women sued the university for the gender gap in admissions. With each school examined separately (law, medicine, engineering, etc.), women were admitted higher than men! However, the average suggested that men were actually admitted at a much higher rate than women.” (https://www.statisticshowto.com/what-is-simpsons-paradox/)

第四个因素是尤尔-辛普森效应 。 定义是几组数据提示一件事,但是合并后得出的结论却相反。 “ 1973年。入学率在伯克利大学的研究生院进行了调查。 妇女状告大学,要求招生中存在性别差距。 在每所学校分别接受检查(法律,医学,工程学等)的情况下,女性的录取率高于男性! 但是,平均数表明,男子的入学率实际上比女子高得多。” (https://www.statisticshowto.com/what-is-simpsons-paradox/)

It’s confusing as hell. But it exists. When applied to the issue at hand, I see a correlation currently being made by pundits that black Governors see the highest rates of police shootings of minorities. Really? Please take a second and think about it in terms of Correlation Does not Equal Causation. Think about Irrelevant Plots and Statistical Significance. Now let’s add in the Yule-Simpson effect and ask- did you look closely at the data? When you combine the data points, it turns out; the inverse is true of police shootings as a whole in a few of those cities.

真是令人困惑。 但是它存在。 当应用于眼前的问题时,我看到专家们目前的关联是黑人总督看到警察对少数民族的枪击率最高。 真? 请花点时间考虑一下“相关不等于因果关系”。 考虑不相关的图和统计意义。 现在让我们添加Yule-Simpson效应,并询问-您是否仔细查看了数据? 当您合并数据点时,事实证明; 相反,在其中一些城市中,整个警察枪击案的发生是正确的。

Finally, there’s Sampling. Sampling relates to the integrity of the one compiling the data. I can sample only the vegan community and come back with a statistic that says nobody eats meat. How do I protect that finding? I hide the specifics of the sampled population. This one speaks for itself, and can be the most damning proof of confirmation bias.

最后是采样 。 采样与编译数据的完整性有关。 我只能抽样素食主义者社区,然后返回一个统计数据,说没有人吃肉。 我如何保护这一发现? 我隐藏了抽样人群的细节。 这是不言而喻的,可以说是最有力的确认偏差的证据。

结论 (Conclusion)

Our reaction to a statistic shouldn’t be outrage, action, or use it as a Facebook post. It should be to ask questions about the statistic. If the statistic proves reliable- such as prison inmate population statistics- then do it. But if you share a statistic that backs your view up without asking the questions first, that very act is harmful promulgation of a potentially false narrative.

我们对统计的React不应是暴行,采取行动或将其用作Facebook帖子。 应该问有关统计的问题。 如果该统计数据可靠(例如监狱囚犯人口统计数据),则可以这样做。 但是,如果您共享一个统计数据来支持您的观点而没有先问问题,那么这种行为就有害于发布可能是虚假的叙述。

Don’t be that guy.

别那样

翻译自: https://medium.com/@jonathantaylor_93097/the-problem-with-statistics-f82e956b14af

统计数字问题

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值