样本均值的抽样分布_抽样分布样本均值

最新推荐文章于 2023-03-28 19:41:54 发布

张_伟_杰

最新推荐文章于 2023-03-28 19:41:54 发布

阅读量1w

点赞数

文章标签： python 机器学习

原文链接：https://towardsdatascience.com/sampling-distribution-sample-mean-fcf69484535e

版权

本文探讨了样本均值的抽样分布概念，源于数据科学领域的统计学原理，介绍了如何理解和应用这一理论。

摘要由CSDN通过智能技术生成

样本均值的抽样分布

One of the most important concepts discussed in the context of inferential data analysis is the idea of sampling distributions. Understanding sampling distributions helps us better comprehend and interpret results from our descriptive as well as predictive data analysis investigations. Sampling distributions are also frequently used in decision making under uncertainty and hypothesis testing.

在推论性数据分析的背景下讨论的最重要的概念之一是采样分布的想法。了解采样分布有助于我们更好地理解和解释描述性和预测性数据分析调查的结果。抽样分布也经常用于不确定性和假设检验的决策中。

什么是抽样分布？ (What are sampling distributions?)

You may already be familiar with the idea of probability distributions. A probability distribution gives us an understanding of the probability and likelihood associated with values (or range of values) that a random variable may assume. A random variable is a quantity whose value (outcome) is determined randomly. Some examples of a random variable include, the monthly revenue of a retail store, the number of customers arriving at a car wash location on any given day, the number of accidents on a certain highway on any given day, weekly sales volume at a retail store, etc. Although the outcome of a random variable is random, the probability distribution allows us to gain and understanding about the likelihood and probabilities of different values occurring in the outcome. Sampling distributions are probability distributions that we attach to sample statistics of a sample.

您可能已经熟悉概率分布的概念。概率分布使我们对与随机变量可能采用的值(或值的范围)相关的概率和似然性有所了解。随机变量是其值(结果)是随机确定的数量。随机变量的一些示例包括：零售商店的月收入，在任何给定的一天到达洗车地点的顾客数量，在任何给定的一天在特定高速公路上发生的事故数量，在零售店的每周销量尽管随机变量的结果是随机的，但概率分布使我们获得并了解结果中出现的不同值的可能性和概率。抽样分布是我们附加到样本的样本统计量的概率分布。

样本均值作为样本统计量 (Sample mean as a sample statistic)

A sample statistic (also known simply as a statistic) is a value learned from a sample. Here is an example, suppose you collect the results of a survey filled out by 250 randomly selected individuals who live in a certain neighborhood. Based on the survey results you realize that the average annual income of the individuals in this sample is $82,512. This is a sample statistic and is denoted by x̅ = $82,512. The sample mean is also a random variable (denoted by X̅) with a probability distribution. The probability distribution for X̅ is called the sampling distribution for the sample mean. Sampling distribution could be defined for other types of sample statistics including sample proportion, sample regression coefficients, sample correlation coefficient, etc.

样本统计量(也简称为统计量)是从样本中学到的值。这是一个示例，假设您收集由居住在某个社区中的250个随机选择的个人填写的调查结果。根据调查结果，您会发现此样本中的个人平均年收入为$ 82,512。这是一个样本统计量，用x̅= $ 82,512表示。 样本均值也是具有概率分布的随机变量(用X表示)。 X̅的概率分布称为样本均值的采样分布。可以为其他类型的样本统计定义样本分布，包括样本比例，样本回归系数，样本相关系数等。

You might be wondering why X̅ is a random variable while the sample mean is just a single number! The key to understanding this lies in the idea of sample to sample variability. This idea refers to the fact that samples drawn from the same population are not identical. Here’s an example, suppose in the example above, instead of conducting only one survey of 250 individuals living in a particular neighborhood, we conducted 35 samples of the same size in that neighborhood. If we calculated the sample mean x̅ for each of the 35 samples, you would be getting 35 different values. Now suppose, hypothetically, we conducted many many surveys of the same size in that neighborhood. We would be getting many many (different) values for sample means. The distribution resulting from those sample means is what we call the sampling distribution for sample mean. Thinking about the sample mean from this perspective, we can imagine how X̅ (note the big letter) is the random variable representing sample means and x̅ (note t