ks检验 matlab,KS检验-CSDN博客

此词条由Jie翻译。

文件:KS Example

Kolmogorov–Smirnov统计数据的图示。红线是累积分布函数，蓝线是经验分布函数，黑色箭头是K–S统计量。

In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test). It is named after Andrey Kolmogorov and Nikolai Smirnov.

在统计学中，Kolmogorov–Smirnov检验(K-S检验或KS检验)属于非参数检验，具有一维概率分布的连续(或不连续，请参见第2.2节)均等性，可用于比较一个样本分布与参考概率分布(单一样本K-S检验)，或比较两个样本分布(两个样本的K-S检验)。它以Andrey Kolmogorov和Nikolai Smirnov的名字命名。

The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2), purely discrete or mixed (see Section 2.2). In the two-sample case (see Section 3), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.

Kolmogorov-Smirnov统计量化了样本分布的经验分布函数Empirical distribution function与参考分布的累积分布函数Cumulative distribution function之间的距离，或者是两个样本分布的经验分布函数之间的距离。该统计量的零分布Null distribution是基于零假设Null hypothesis(或称原始假设)下计算的，可以从参考分布中抽取样本(在单个样本的情况下)，或者从相同分布中抽取样本组(在两个样本的情况下)。属于单样本情况的时候，零假设(原假设)考虑的分布可能是连续的(请参阅第2节)，纯离散的或混合的(请参阅第2.2节)。然而在考虑两个样本情况下(请参阅第3节)，原假设下的分布仅能确定为连续分布，在其他方面并不受限制。

The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.

K–S双样本检验是比较两个样本分布最有用，也是最通用的非参数方法之一，因为在对比两个样本时，K-S检验对其经验累积分布函数的位置和形状差异具有一定的敏感性。

The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see Test with estimated parameters). Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test.

Kolmogorov–Smirnov检验经过修改以后可以作为拟合优度检验goodness of fit test。在测试分布正态性的特殊情况下，将样本先标准化再与标准正态分布进行比较。这相当于将参考分布的均值和方差设置为与样本估计值相等。显然，使用这些值和方差来定义特定参考分布会更改检验统计量的零分布(请参阅使用估算参数进行检验)。各种研究发现，即使采用这种校正形式，该测试也不能像Shapiro-Wilk检验或Anderson-Darling检验那样有效地检验正态性。当然，这些其他测试也有其自身的缺点。例如，Shapiro–Wilk检验在具有许多相同值的样本中效果并不好。

Kolmogorov–Smirnov statistic Kolmogorov-Smirnov统计

where I_{[-\infty,x]}(X_i) is the indicator function, equal to 1 if X_i \le x and equal to 0 otherwise.

The Kolmogorov–Smirnov statistic for a given cumulative distribution function F(x) is

[math]\displaystyle{ F_n(x)={1 \over n}\sum_{i=1}^n I_{[-\infty,x]}(X_i) }[/math]

D_n= \sup_x |F_n(x)-F(x)|

D _ n = sup _ x | f _ n (x)-f (x) |

where [math]\displaystyle{ I_{[-\infty,x]}(X_i) }[/math] is the indicator function, equal to 1 if [math]\displaystyle{ X_i \le x }[/math] and equal to 0 otherwise.

where supx is the supremum of the set of distances. By the Glivenko–Cantelli theorem, if the sample comes from distribution F(x), then Dn converges to 0 almost surely in the limit when n goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution). Donsker's theorem provides a yet stronger result.

In practice, the statistic requires a relatively large number of da