MATLAB ttest和ttest2的区别

最新推荐文章于 2023-10-07 21:49:23 发布

selous

最新推荐文章于 2023-10-07 21:49:23 发布

阅读量1.7w

点赞数 4

分类专栏：机器学习文章标签： matlab ttest 假设检验 ttest2

本文链接：https://blog.csdn.net/selous/article/details/79791979

版权

机器学习专栏收录该内容

25 篇文章 1 订阅

订阅专栏

Author:ZHANG TAO
Data:2018.4.3

student t-test假设检验

$\chi^2-$ distribution

设 $X_1,X_2,...,X_n$ 是来自总体 $N(0,1)$ 的样本，则称统计量

χ 2 = X 21 + X 22 + . . . + X 2 n

$\chi^2 = X_1^2+X_2^2+...+X_n^2$
服从自由度为n的卡方分布，记为

χ2(n) χ 2 ( n ) $\chi^2(n)$

$t-$ distribution

设 $X \sim N(0,1),Y \sim \chi^2(n)$ ,并且X和Y独立，则称随机变量

t = X Y / n - - - - \sqrt

$t = \frac{X}{\sqrt {Y/n}}$
服从自由度为n的t分布，记作

t∼t(n) t ∼ t ( n ) $t \sim t(n)$

这里写图片描述

t-test

One-sample

样本方差 $\sigma^2$ 未知，关于 $\mu$ 的检验(t检验)

构造

t = X ¯ ¯ ¯ ¯ - μ 0 S / n - - \sqrt \sim t (n - 1)

$t=\frac{\overline X-\mu_0}{S/\sqrt{n}} \sim t(n-1)$

作为检测统计量，如果观测值过大就拒绝 $H_0$

P {当 H 0 为 真 时 拒 绝 H 0} = P μ 0 {| X ¯ ¯ ¯ ¯ - μ 0 S / n - - \sqrt | \geq k} = α

$P\{当H_0为真时拒绝H_0\} = P_{\mu_0}\{ |\frac{\overline X-\mu_0}{S/\sqrt{n}}|\ge k\} = \alpha$

如图1可知，当 $H_0$ 为真时，监测统计量位于阴影部分的概率时 $\alpha$ (显著性水平，特别小的值)，所以如果此事发生(也就是检测统计量的值位于阴影处时)，就可以认为原假设时错误的，而拒绝 $H_0$

这里写图片描述

问题1:matlab中的ttest返回的p_value表示的含义?

returns the p-value, i.e., the probability of observing the given result, or one more extreme, 
by chance if the null hypothesis is true.  Small values of P cast doubt on the validity of the 
null hypothesis.

形象的表示就是如图二所示，观测变量的值求出以后，红的区域的值就是P的value,而只有 $P_{value}<\alpha$ 时才会拒绝原假设。

假设检验的难点在于如何通过已知量构造已知分布的检验统计量

Paired sample

一般，假设有n对相互独立的观察结果： $(X_1,Y_1),...,(X_n,Y_n),$ 另 $D_1=X_1-Y_1,D_2=X_2-Y_2,...,D_n=X_n-Y_n,$ 则 $D_1,D_2,...,D_n$ 相互独立。又因为 $D_1,D_2,...,D_n$ 是由同一个因素引起的，所以可以认为他们服从同一个分布。故：

H 0 : μ D = 0, H 1 : μ D \neq 0

$H_0:\mu_D=0,H_1:\mu_D \ne 0$
则检测统计量为：

t = | d ¯ ¯ ¯ s D / n - - \sqrt | \sim t (n - 1)

$t = | \frac{\overline d}{s_D/\sqrt{n}} | \sim t(n-1)$

matlab实现使用ttest：

ttest  One-sample and paired-sample t-test.

Unpaired(Independent) sample

检验具有相同方差的两正态总体均值差的假设.假设 $X_1,X_2...X_{n_1}$ 是来自分布 $N(\mu_1,\sigma^2)$ 的样本， $Y_1,Y_2,...,Y_{n_2}$ 是来自正态总体 $N(\mu_2,\sigma^2)$ ,且设两样本独立。

H 0 : μ 1 - μ 2 = σ, H 1 : μ 1 - μ 2 \neq σ

$H_0 : \mu_1-\mu_2=\sigma,H_1:\mu_1-\mu_2 \ne \sigma$

故

t = ( X ¯ ¯ ¯ ¯ - Y ¯ ¯ ¯ ¯ ) - σ S w 1 n 1 + 1 n 2 - - - - - - \sqrt, 其 中 S 2 w = ( n 1 - 1 ) S 2 1 + ( n 2 - 1 ) S 2 2 n 1 + n 2 - 2

$t = \frac{(\overline X-\overline Y)-\sigma}{S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \\ 其中S_w^2 = \frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}$
易知：

t \sim t (n 1 + n 2 - 2)

$t \sim t(n_1+n_2-2)$

matlab中实现时使用ttest2

ttest2默认是X和Y的方差时相等的，可以通过Vartype:unequal改成方差不相等。

'equal' Conduct test using the assumption that x and y are from normal distributions with unknown but equal variances.
'unequal'   Conduct test using the assumption that x and y are from normal distributions with unknown and unequal variances. This is called the Behrens-Fisher problem. ttest2 uses Satterthwaite's approximation for the effective degrees of freedom.
Vartype must be a single variance type, even when x is a matrix or a multidimensional array.

Example: 'Vartype','unequal'