T-test

最新推荐文章于 2022-04-06 06:01:36 发布

W2388727409

最新推荐文章于 2022-04-06 06:01:36 发布

阅读量430

点赞数

文章标签： python matlab java

原文链接：http://www.cnblogs.com/JoAnnal/p/6734488.html

版权

Uses

Among the most frequently used t-tests are:

A one-sample location test of whether the mean of a population has a value specified in a null hypothesis.
A two-sample location test of the null hypothesis such that the means of two populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.
A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test: see paired difference test.
A test of whether the slope of a regression line differs significantly from 0.

One-sample t-test

In testing the null hypothesis that the population mean is equal to a specified value μ₀, one uses the statistic

t = \frac{\overline{x} - \mu_0}{s/\sqrt{n}}

where ${\overline {x}}$

Slope of a regression line

Suppose one is fitting the model

Y=\alpha +\beta x+\varepsilon ,

where x is known, α and β are unknown, and ε is a normally distributed random variable with mean 0 and unknown variance σ², and Y is the outcome of interest. We want to test the null hypothesis that the slope β is equal to some specified value β₀ (often taken to be 0, in which case the null hypothesis is that x and y are independent).

Let

\begin{align} \widehat\alpha, \widehat\beta & = \text{least-squares estimators}, \\ SE_{\widehat\alpha}, SE_{\widehat\beta} & = \text{the standard errors of least-squares estimators}. \end{align}

Then

t_\text{score} = \frac{\widehat\beta - \beta_0}{ SE_{\widehat\beta} }\sim\mathcal{T}_{n-2}

has a t-distribution with n − 2 degrees of freedom if the null hypothesis is true. The standard error of the slope coefficient:

SE_{​{\widehat \beta }}={\frac {​{\sqrt {​{\frac {1}{n-2}}\sum _{​{i=1}}^{n}(y_{i}-\widehat y_{i})^{2}}}}{​{\sqrt {\sum _{​{i=1}}^{n}(x_{i}-\overline {x})^{2}}}}}

can be written in terms of the residuals. Let

{\begin{aligned}\widehat \varepsilon _{i}&=y_{i}-\widehat y_{i}=y_{i}-(\widehat \alpha +\widehat \beta x_{i})={\text{residuals}}={\text{estimated errors}},\\{\text{SSR}}&=\sum _{​{i=1}}^{n}\widehat \varepsilon _{i}^{​{\;2}}={\text{sum of squares of residuals}}.\end{aligned}}

t_\text{score} = \frac{(\widehat\beta - \beta_0)\sqrt{n-2}}{ \sqrt{\text{SSR}/\sum_{i=1}^n \left(x_i - \overline{x}\right)^2} }.

Independent two-sample t-test

Equal sample sizes, equal variance

Given two groups (1, 2), this test is only applicable when:

the two sample sizes (that is, the number, n, of participants of each group) are equal;
it can be assumed that the two distributions have the same variance;

Violations of these assumptions are discussed below.

The t statistic to test whether the means are different can be calculated as follows:

t={\frac {​{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}{\sqrt {2/n}}}}

where

\ s_{p}={\sqrt {\frac {s_{X_{1}}^{2}+s_{X_{2}}^{2}}{2}}}

Here $s_{p}$

For significance testing, the degrees of freedom for this test is 2n − 2 where n is the number of participants in each group.

Equal or unequal sample sizes, equal variance

This test is used only when it can be assumed that the two distributions have the same variance. (When this assumption is violated, see below.) Note that the previous formulae are a special case valid when both samples have equal sizes: n = n₁ = n₂. The t statistic to test whether the means are different can be calculated as follows:

t={\frac {​{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}\cdot {\sqrt {​{\frac {1}{n_{1}}}+{\frac {1}{n_{2}}}}}}}

where

s_{p}={\sqrt {\frac {(n_{1}-1)s_{X_{1}}^{2}+(n_{2}-1)s_{X_{2}}^{2}}{n_{1}+n_{2}-2}}}.

is an estimator of the pooled standard deviation of the two samples: it is defined in this way so that its square is an unbiased estimator of the common variance whether or not the population means are the same. In these formulae, $n i - 1 is the number of degrees of freedom for each group, and the total sample size minus two (that is, n 1 + n 2 - 2) is the total number of degrees of freedom, which is used in significance testing.$

Equal or unequal sample sizes, unequal variances

Main article: Welch's t-test

This test, also known as Welch's t-test, is used only when the two population variances are not assumed to be equal (the two sample sizes may or may not be equal) and hence must be estimated separately. The t statistic to test whether the population means are different is calculated as:

t={​{\overline {X}}_{1}-{\overline {X}}_{2} \over s_{\overline {\Delta }}}

where

s_{\overline {\Delta }}={\sqrt {​{s_{1}^{2} \over n_{1}}+{s_{2}^{2} \over n_{2}}}}.

Here ${s_{\overline {\Delta }}^{2}}$

\mathrm{d.f.} = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)}.

This is known as the Welch–Satterthwaite equation. The true distribution of the test statistic actually depends (slightly) on the two unknown population variances (see Behrens–Fisher problem).

F-test

The formula for the one-way ANOVA F-test statistic is

F={\frac {​{\text{explained variance}}}{​{\text{unexplained variance}}}},

F={\frac {​{\text{between-group variability}}}{​{\text{within-group variability}}}}.

The "explained variance", or "between-group variability" is

\sum _{i=1}^{K}n_{i}({\bar {Y}}_{i\cdot }-{\bar {Y}})^{2}/(K-1)

where ${\bar {Y}}_{{i\cdot }}$

The "unexplained variance", or "within-group variability" is

\sum _{i=1}^{K}\sum _{j=1}^{n_{i}}(Y_{ij}-{\bar {Y}}_{i\cdot })^{2}/(N-K),

where Y_ij is the j^th observation in the i^th out of K groups and N is the overall sample size. This F-statistic follows the F-distribution with K−1, N −K degrees of freedom under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value.

Note that when there are only two groups for the one-way ANOVA F-test, F=t²where t is the Student's t statistic.

Software implementations

Many spreadsheet programs and statistics packages, such as QtiPlot, LibreOffice Calc, Microsoft Excel, SAS, SPSS, Stata, DAP, gretl, R, Python, PSPP, Matlab and Minitab, include implementations of Student's t-test.

Language/Program	Function	Notes
Microsoft Excel pre 2010	`TTEST(array1, array2, tails, type)`	See [1]
Microsoft Excel 2010 and later	`T.TEST(array1, array2, tails, type)`	See [2]
LibreOffice	`TTEST(Data1; Data2; Mode; Type)`	See [3]
Google Sheets	`TTEST(range1, range2, tails, type)`	See [4]
Python	`scipy.stats.ttest_ind(a, b, axis=0, equal_var=True)`	See [5]
Matlab	`ttest(data1, data2)`	See [6]
Mathematica	`TTest[{data1,data2}]`	See [7]
R	`t.test(data1, data2, var.equal=TRUE)`	See [8]
SAS	`PROC TTEST`	See [9]
Java	`tTest(sample1, sample2)`	See [10]
Julia	`EqualVarianceTTest(sample1, sample2)`	See [11]
Stata	`ttest data1 == data2`	See [12]