T-test

Uses

Among the most frequently used t-tests are:

  • A one-sample location test of whether the mean of a population has a value specified in a null hypothesis.
  • A two-sample location test of the null hypothesis such that the means of two populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.
  • A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test: see paired difference test.
  • A test of whether the slope of a regression line differs significantly from 0.

One-sample t-test

In testing the null hypothesis that the population mean is equal to a specified value μ0, one uses the statistic

t = \frac{\overline{x} - \mu_0}{s/\sqrt{n}}

where  {\overline {x}} is the sample mean, s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test are n − 1. Although the parent population does not need to be normally distributed, the distribution of the population of sample means,  \overline {x}, is assumed to be normal. By the central limit theorem, if the sampling of the parent population is independent and the first moment of the parent population exists then the sample means will be approximately normal.(The degree of approximation will depend on how close the parent population is to a normal distribution and the sample size, n.)

Slope of a regression line

Suppose one is fitting the model

Y=\alpha +\beta x+\varepsilon ,

where x is known, α and β are unknown, and ε is a normally distributed random variable with mean 0 and unknown variance σ2, and Y is the outcome of interest. We want to test the null hypothesis that the slope β is equal to some specified value β0 (often taken to be 0, in which case the null hypothesis is that x and y are independent).

Let

\begin{align} \widehat\alpha, \widehat\beta & = \text{least-squares estimators}, \\ SE_{\widehat\alpha}, SE_{\widehat\beta} & = \text{the standard errors of least-squares estimators}. \end{align}

Then

t_\text{score} = \frac{\widehat\beta - \beta_0}{ SE_{\widehat\beta} }\sim\mathcal{T}_{n-2}

has a t-distribution with n − 2 degrees of freedom if the null hypothesis is true. The standard error of the slope coefficient:

SE_{​{\widehat \beta }}={\frac  {​{\sqrt  {​{\frac  {1}{n-2}}\sum _{​{i=1}}^{n}(y_{i}-\widehat y_{i})^{2}}}}{​{\sqrt  {\sum _{​{i=1}}^{n}(x_{i}-\overline {x})^{2}}}}}

can be written in terms of the residuals. Let

{\begin{aligned}\widehat \varepsilon _{i}&=y_{i}-\widehat y_{i}=y_{i}-(\widehat \alpha +\widehat \beta x_{i})={\text{residuals}}={\text{estimated errors}},\\{\text{SSR}}&=\sum _{​{i=1}}^{n}\widehat \varepsilon _{i}^{​{\;2}}={\text{sum of squares of residuals}}.\end{aligned}}

 

t_\text{score} = \frac{(\widehat\beta - \beta_0)\sqrt{n-2}}{ \sqrt{\text{SSR}/\sum_{i=1}^n \left(x_i - \overline{x}\right)^2} }.

Independent two-sample t-test

Equal sample sizes, equal variance

Given two groups (1, 2), this test is only applicable when:

  • the two sample sizes (that is, the number, n, of participants of each group) are equal;
  • it can be assumed that the two distributions have the same variance;

Violations of these assumptions are discussed below.

The t statistic to test whether the means are different can be calculated as follows:

{\displaystyle t={\frac {​{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}{\sqrt {2/n}}}}}

where

  {\displaystyle \ s_{p}={\sqrt {\frac {s_{X_{1}}^{2}+s_{X_{2}}^{2}}{2}}}}

Here s_{p} is the pooled standard deviation for n=n1=n2 and s_{X_1}^2 and  s_{X_2}^2 are the unbiased estimators of the variances of the two samples. The denominator of t is the standard error of the difference between two means.

For significance testing, the degrees of freedom for this test is 2n − 2 where n is the number of participants in each group.

Equal or unequal sample sizes, equal variance

This test is used only when it can be assumed that the two distributions have the same variance. (When this assumption is violated, see below.) Note that the previous formulae are a special case valid when both samples have equal sizes: n = n1 = n2. The t statistic to test whether the means are different can be calculated as follows:

{\displaystyle t={\frac {​{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}\cdot {\sqrt {​{\frac {1}{n_{1}}}+{\frac {1}{n_{2}}}}}}}}

where

{\displaystyle s_{p}={\sqrt {\frac {(n_{1}-1)s_{X_{1}}^{2}+(n_{2}-1)s_{X_{2}}^{2}}{n_{1}+n_{2}-2}}}.}

is an estimator of the pooled standard deviation of the two samples: it is defined in this way so that its square is an unbiased estimator of the common variance whether or not the population means are the same. In these formulae, ni − 1 is the number of degrees of freedom for each group, and the total sample size minus two (that is, n1 + n2 − 2) is the total number of degrees of freedom, which is used in significance testing.

Equal or unequal sample sizes, unequal variances
Main article: Welch's t-test

This test, also known as Welch's t-test, is used only when the two population variances are not assumed to be equal (the two sample sizes may or may not be equal) and hence must be estimated separately. The t statistic to test whether the population means are different is calculated as:

{\displaystyle t={​{\overline {X}}_{1}-{\overline {X}}_{2} \over s_{\overline {\Delta }}}}

where

{\displaystyle s_{\overline {\Delta }}={\sqrt {​{s_{1}^{2} \over n_{1}}+{s_{2}^{2} \over n_{2}}}}.}

Here s2i is the unbiased estimator of the variance of each of the two samples with ni = number of participants in group i, i=1 or 2. Note that in this case  {\displaystyle {s_{\overline {\Delta }}^{2}}} is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as an ordinary Student's t distribution with the degrees of freedom calculated using

\mathrm{d.f.} = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)}.

This is known as the Welch–Satterthwaite equation. The true distribution of the test statistic actually depends (slightly) on the two unknown population variances (see Behrens–Fisher problem).


F
-test

The formula for the one-way ANOVA F-test statistic is

F={\frac  {​{\text{explained variance}}}{​{\text{unexplained variance}}}},

or

F={\frac  {​{\text{between-group variability}}}{​{\text{within-group variability}}}}.

The "explained variance", or "between-group variability" is

{\displaystyle \sum _{i=1}^{K}n_{i}({\bar {Y}}_{i\cdot }-{\bar {Y}})^{2}/(K-1)}

where {\bar  {Y}}_{​{i\cdot }} denotes the sample mean in the ith group, ni is the number of observations in the ith group, {\bar  {Y}} denotes the overall mean of the data, and K denotes the number of groups.

The "unexplained variance", or "within-group variability" is

{\displaystyle \sum _{i=1}^{K}\sum _{j=1}^{n_{i}}(Y_{ij}-{\bar {Y}}_{i\cdot })^{2}/(N-K),}

where Yij is the jth observation in the ith out of K groups and N is the overall sample size. This F-statistic follows the F-distribution with K−1, N −K degrees of freedom under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value.

Note that when there are only two groups for the one-way ANOVA F-test, F=t2where t is the Student's t statistic.

Software implementations

Many spreadsheet programs and statistics packages, such as QtiPlot, LibreOffice Calc, Microsoft Excel, SAS, SPSS, Stata, DAP, gretl, R, Python, PSPP, Matlab and Minitab, include implementations of Student's t-test.

Language/ProgramFunctionNotes
Microsoft Excel pre 2010TTEST(array1, array2, tails, type)See [1]
Microsoft Excel 2010 and laterT.TEST(array1, array2, tails, type)See [2]
LibreOfficeTTEST(Data1; Data2; Mode; Type)See [3]
Google SheetsTTEST(range1, range2, tails, type)See [4]
Pythonscipy.stats.ttest_ind(a, b, axis=0, equal_var=True)See [5]
Matlabttest(data1, data2)See [6]
MathematicaTTest[{data1,data2}]See [7]
Rt.test(data1, data2, var.equal=TRUE)See [8]
SASPROC TTESTSee [9]
JavatTest(sample1, sample2)See [10]
JuliaEqualVarianceTTest(sample1, sample2)See [11]
Statattest data1 == data2See [12]

 

转载于:https://www.cnblogs.com/JoAnnal/p/6734488.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值