大数定理 中心极限定理_了解中心极限定理

大数定理 中心极限定理

在我们编码之前,快速回顾 (Before We Code, A Quick Review)

Today I want to break down the central limit theorem and how it relates to so much of the work that a data scientist performs.

今天,我想打破中心极​​限定理,以及它与数据科学家所做的大量工作之间的关系。

直方图刷新 (Histogram Refresher)

First things first, a core tool to any data scientist is a very simple chart type called a histogram. While you’re sure to have seen many a histogram, we often look past its significance. The core purpose to a histogram is to understand the distribution of a given dataset.

首先,对于任何数据科学家而言,核心工具都是非常简单的图表类型,称为直方图。 虽然您肯定会看到很多直方图,但我们经常会忽略它的重要性。 直方图的核心目的是了解给定数据集的分布。

As a refresher, a histogram represents the number of occurrences on the y-axis of different values of a variable, found on the x-axis.

作为回顾,直方图表示在x轴上找到的变量的不同值在y轴上出现的次数。

Here is an example of this, we want to understand the distribution of miles per gallon across the population of cars in our dataset. Here we’re using the mtcars dataset and can see that on the right side of our chart that there is a bit of a tail; this histogram is what is known as right-skewed. The concept behind this being that yes there are cars on the extreme of gas mileage, but they are very few.

这是一个示例,我们想了解数据集中每加仑里程在整个汽车种群中的分布。 这里我们使用的是mtcars数据集,可以看到在图表的右侧有一些尾巴; 这个直方图就是所谓的右偏。 这背后的概念是,是的,有些汽车的油耗极高,但很少。

Image for post

标准正态分布 (Standard Normal Distributions)

Similar to what you just saw, the classic distribution that you’re likely to have seen is what is known as a normal distribution, also known as a bell curve, or standard normal distribution. The core idea being that the “distribution” of occurrences is **symmetrical**.

与您刚刚看到的类似,您可能会看到的经典分布是正态分布,钟形曲线或标准正态分布。 核心思想是事件的“分布”是“对称的”。

Take a look at the plot below. We see a histogram similar to the previous, rather here it is far more symmetrical.

看一下下面的图。 我们看到的直方图与之前的相似,但在这里它更加对称。

Image for post

中心极限定理究竟是什么? (What Exactly is the Central Limit Theorem?)

The central limit theorem states the distribution of sample means should be approximately normal.

中心极限定理指出样本均值的分布应近似正态。

让我们看一下实践中的定理 (Let’s See the Theorem in Practice)

Consider the following example. Let’s say you work at a university and you want to understand the distribution of earnings in an alumni’s first year out of school.

考虑以下示例。 假设您在大学工作,并且想了解校友离开学校第一年的收入分配情况。

The fact is you won’t be able to collect that datapoint for every single alumnus. Alternatively you will sample the population a variety of times obtaining individual sample means for each ‘sample’.

事实是您将无法为每个校友收集该数据点。 或者,您将对人口进行多次采样,以获得每个“样本”的单独样本均值。

We now plot the sample means via a histogram and can see the emergence of a normal distribution.

现在,我们通过直方图绘制样本均值,可以看到正态分布的出现。

The key takeaway here is that even if the input variables are not normally distributed, the sampling distribution will approximate the standard normal distribution.

这里的关键要点是,即使输入变量不是正态分布的,采样分布也将近似于标准正态分布。

让我们编码! (Let’s Code!)

As a final demonstration of this idea, we initially plotted the distribution of MPG from the mtcars dataset. Here we break out a vector for each of the mpg samples, we then loop through 50 samples, in each taking the mean of ten random records in the dataset. We once again plot these as a histogram and can see that normal distribution emerge.

作为该想法的最终证明,我们最初从mtcars数据集中绘制了MPG的分布图。 在这里,我们为每个mpg样本分解一个向量,然后遍历50个样本,每个样本取数据集中十个随机记录的平均值。 我们再次将它们绘制为直方图,可以看到正态分布出现了。

mpg_samples <- c()for (i in 1:50) { 
mpg_samples[i] = mean(sample(mtcars$mpg, 10, replace = TRUE))
}
hist(mpg_samples, col = 'purple', xlab = "MPG")
Image for post

This should serve as a foundational concept to your data science training, which is fundamental to hypothesis testing, experimentation, among other data science methods & techniques.

这应该作为您的数据科学培训的基础概念,这是假设检验,实验以及其他数据科学方法和技术的基础。

I hope you found this helpful!

希望对您有所帮助!

Happy data science-ing!

快乐的数据科学!

翻译自: https://towardsdatascience.com/understanding-the-central-limit-theorem-e3f7061a8d92

大数定理 中心极限定理

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值