cc和毫升换算_毫升学生两个样本配对样本t检验不

本文探讨了在Z检验中遇到未知总体标准差和均值的情况,介绍了如何通过假设样本分布参数进行处理。t检验用于小样本,尤其在t分布适用于非高斯分布时。文章详细解释了t分布的起源,t检验的公式,以及配对和两样本t检验的区别,适合对比不同样本是否来自同一分布。
摘要由CSDN通过智能技术生成

cc和毫升换算

In Z-test, we assume we know the standard deviation of the population. What if we don’t know the standard deviation of the population? In this case, we assume the standard deviation of the sample distribution and keep going with Z-test. What if we don’t know the mean of the population? We can similarly somehow assume the mean and go with Z-test. When do we use a t-test? We use a t-test when the sample size is small. How small is small? We are using CLT(Central Limit Theorem) and it works well when the sample size is large enough. Since the sampling distribution should be Gaussian Distribution. If the sample size is too small, then this assumption starts to break apart, it does not follow Gaussian Distribution. It follows the heavy tail distribution, t-distribution.

在Z检验中,我们假设我们知道总体的标准偏差。 如果我们不知道总体的标准偏差怎么办? 在这种情况下,我们假设样本分布的标准偏差,并继续进行Z检验。 如果我们不知道人口的平均值怎么办? 我们可以类似地假定均值并进行Z检验。 我们什么时候使用t检验? 当样本量较小时,我们使用t检验。 小有多小? 我们使用的是CLT(中心极限定理),当样本量足够大时,它可以很好地工作。 由于采样分布应该是高斯分布。 如果样本量太小,则此假设开始破裂,它不遵循高斯分布。 它遵循重尾分布,t分布。

Notes: The assumption is that the standard deviation of samples and populations is the same. We are trying to find out the difference in the mean of samples and populations. However, ANOVA doesn’t assume the standard deviation is the same. I will cover ANOVA also in the later posts.

注意:假定样本和总体的标准偏差相同。 我们试图找出样本和总体平均值的差异。 但是,ANOVA不假定标准偏差相同。 我将在以后的文章中介绍ANOVA。

t检验 (t-test)

Question: I have the mean of population and n samples,n is small. Can I reject the null hypothesis?

问题:我有总体平均值,n个样本,n小。 我可以否定原假设吗?

The inventor of t-distribution was working at the brewery in England, he was in charge of the quality of the whiskey, it is a really important job for all of us. He invented many statistical techniques to solve the whiskey problems, alcohol level, or tastes e.t.c. He published these results to academic fields but he had known that people don’t pay attention to the paper from the brewery. He wrote his name as Student. The author of the paper is Mr.Student. Therefore, its distribution name is Student t-distribution. If you like those kinds of stories, I recommend you to read this book, the lady tasting tea.

t-distribution的发明者在英国的啤酒厂工作,他负责威士忌的质量,这对我们所有人来说都是一项非常重要的工作。 他发明了许多统计技术来解决威士忌问题,酒精含量或口味等问题。他将这些结果发布到学术领域,但他知道人们并不关注啤酒厂的论文。 他把名字写成学生。 该论文的作者是学生先生。 因此,其分配名称为Student t-distribution。 如果您喜欢这些故事,我建议您阅读这本书,那位女士品尝茶。

Image for post
Formular for t-test
t检验配方

It is really similar to Z-test. The differences are t-test use t-distribution, t-table, and sample standard deviation, not population distribution. N-1 represents the degree of freedom, it means how many samples you have will change your t-distribution.

它确实类似于Z测试。 差异是t检验使用的t分布,t表和样本标准差,而不是总体分布。 N-1代表自由度,这意味着您拥有多少样本将改变t分布。

Image for post
t-table
桌子

If you have many samples, then distribution is going to follow Gaussian distribution, it follows the Z-test. However, if you have small samples, you need to make tails heavier and heavier to compensate for the shortage of the samples. Again, we use this to compare the mean. Furthermore, the t-test assumes the underlying distribution is the normal distribution but it is extremely robust in practice when the underlying distribution is arbitrary distributions.

如果样本很多,则分布将遵循高斯分布,它遵循Z检验。 但是,如果样品很小,则需要使尾部越来越重,以弥补样品的不足。 同样,我们使用它来比较均值 此外,t检验假设基础分布是正态分布,但是当基础分布是任意分布时, i t在实践中非常稳健

两次样本t检验 (Two Sample t-test)

Question: I have two different samples. I want to ask both samples are from the same distribution or not? I measured the height of people in South Korea and Germany. The underlying distribution, actual height distribution in countries is the same or not statistically.

问题:我有两个不同的样本。 我想问两个样本是否来自同一分布? 我测量了韩国和德国的人高。 基本分布,国家中的实际高度分布在统计上是相同或不同。

H0(Null hypothesis): the means of the underlying distribution is the same. Sample 1 and sample 2 is drawn from each underlying distribution.

H0(零假设):基础分布的均值是相同的。 样本1和样本2是从每个基础分布中提取的。

H1(Alternative hypothesis): the means are different. => two tails, it considers both tails in t-distribution

H1(替代假设):均值不同。 =>两条尾巴,它考虑t分布中的两条尾巴

We calculate this problem with samples and intuitively what we are interested in is the differences of the means of samples. x̅1 -x̅2 is representing it. If this is 0, then we can think its underlying distribution is the same. However, if its value over the range we previously set up, p-value, then, we reject the null hypothesis.

我们用样本计算这个问题,直观上我们感兴趣的是样本方法的差异。 x̅1-x̅2代表它。 如果为0,那么我们可以认为其基础分布是相同的。 但是,如果它的值超出我们先前设置的范围,即p值,则我们拒绝原假设。

Image for post
Two sample t-test formula
两次样本t检验公式

We already know if the underlying distribution is the same, then the differences of its means will be zero. Thus, in the formula, you can imagine there is zero in the numerator and we use standard deviation the sum of the two sample’s standard deviation because the variance of the differences is the sum of the individual variances, you can easily get this, and it uses pooled estimates. The degree of freedom is n1+n2–2. Now, you can find the t-table and find the result.

我们已经知道,如果基础分布相同,则其均值之差将为零。 因此,在公式中,您可以想象分子中有零,并且我们使用标准偏差作为两个样本标准偏差的总和,因为差异的方差是各个方差的总和,您可以轻松地得到它,并且使用汇总估算值。 自由度为n1 + n2–2。 现在,您可以找到t表并找到结果。

Tips: n1 and n2 don’t have to be the same. If you have two algorithms, one is too expensive and another one is relatively cheap, then you don’t have to do the same experiments for both algorithms. Be careful about the standard deviation of the expensive one.

提示: n1和n2不必相同。 如果您有两种算法,一种算法太昂贵而另一种相对便宜,那么您不必对两种算法都进行相同的实验。 小心昂贵的标准偏差。

配对样本t检验 (Paired Sample t-test)

On the tips above, there is a reason people use the paired sample t-test. The number of samples does not have to be the same in the two-sample t-test. It makes problems with standard deviation and it makes the t-score fluctuates a lot depending on how you set the experiment. Therefore, the paired sample t-test assumes the number of samples is the same and the experiments are implemented on the same sample. For example, you have algorithm 1 and algorithm 2, you want to compare which one is better. You prepare the samples and run the algorithms on the same sample and get pairs of the results from it.

在上面的提示中,人们使用配对样本t检验是有原因的。 在两个样本的t检验中,样本的数量不必相同。 这会导致标准偏差出现问题,并使t得分的波动很大,具体取决于您设置实验的方式。 因此,配对样本t检验假设样本数相同,并且对同一样本进行实验。 例如,您有算法1和算法2,您想比较哪种更好。 您可以准备样本,并在同一样本上运行算法,并从中获取结果对。

Tips: If you have full control to implement this experiment, then you should use a paired sample t-test. However, some give the result data and you don’t know how the experiments were going on, then you should use a two-sample t-test.

提示:如果您可以完全控制实施此实验,则应使用配对样本t检验。 但是,有些提供了结果数据,您不知道实验如何进行,那么您应该使用两样本t检验

We have the data, we can actually calculate the differences of data itself because we experiment on the same dataset, not the difference of means.

我们有了数据,实际上可以计算出数据本身的差异,因为我们在同一数据集上进行实验,而不是均值上的差异。

  • The degree of freedom is n-1

    自由度为n-1
  • H0: the mean of the differences from underlying distribution is zero.

    H0:与基础分布之差的平均值为零。

  • H1: the mean is not zero, they are different. You can do other, greater or less.

    H1:均值不为零,它们是不同的。 您可以做更多或更少的其他事情。
Image for post

μ is the underlying distribution mean that we assume to be zero. is the mean of the differences in the results. To sum up, the paired t-test uses the differences as the sample itself but the two-sample t-test uses the result itself.

μ是我们假设为零的基本分布均值。 是结果差异的平均值。 综上所述,配对t检验使用差异作为样本本身,而两样本t检验使用结果本身。

Please Don’t blindly use it and think about its meaning.

请不要盲目使用它,并思考它的含义。

This post is published on 9/11/2020

此帖发布于9/11/2020

翻译自: https://medium.com/swlh/ml-students-two-sample-paired-sample-t-tests-don-t-f0a5f267756e

cc和毫升换算

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值