我正在尝试评估/测试我的数据是否符合特定分布。
关于它有几个问题,我被告知要么使用scipy.stats.kstest,要么使用scipy.stats.ks_2samp。这似乎很简单,给出:(A)数据;(2)分布;(3)拟合参数。唯一的问题是我的结果毫无意义?我想测试我的数据的“好”度,它适合不同的分布,但是从kstest的输出来看,我不知道我是否能做到这一点?"[SciPy] contains K-S""first value is the test statistics, and second value is the p-value. if the p-value is less than 95 (for a level of significance of 5%), this means that you cannot reject the Null-Hypothese that the two sample distributions are identical."np.random.seed(2)
# Sample from a normal distribution w/ mu: -50 and sigma=1
x = np.random.normal(loc=-50, scale=1, size=100)
x
#array([-50.41675785, -50.05626683, -52.1361961 , -48.35972919,
# -51.79343559, -50.84174737, -49.49711858, -51.24528809,
# -51.05795222, -50.90900761, -49.44854596, -47.70779199,
# ...
# -50.46200535, -49.64911151, -49.61813377, -49.43372456,
# -49.79579202, -48.59330376, -51.7379595 , -48.95917605,
# -49.61952803, -50.21713527, -48.8264685 , -52.34360319])
# Try against a Gamma Distribution
distribution = "gamma"
distr = getattr(stats, distribution)
params = distr.fit(x)
stats.kstest(x,distribution,args=params)
KstestResult(statistic=0.078494356486987549, pvalue=0.55408436218441004)
p{}的p值表示normal和gamma采样来自相同的分布?
现在与正态分布相反:# Try against a Normal Distribution
distribution = "norm"
distr = getattr(stats, distribution)
params = distr.fit(x)
stats.kstest(x,distribution,args=params)
KstestResult(statistic=0.070447707170256002, pvalue=0.70801104133244541)
根据这个,如果我取最小的p_值,那么我会得出结论,我的数据来自一个gamma分布,即使它们都是负值?np.random.seed(0)
distr = getattr(stats, "norm")
x = distr.rvs(loc=0, scale=1, size=50)
params = distr.fit(x)
stats.kstest(x,"norm",args=params, N=1000)
KstestResult(statistic=0.058435890774587329, pvalue=0.99558592119926814)
这意味着在5%的显著性水平上,我可以拒绝分布相同的零假设。所以我得出结论,它们是不同的,但显然不是?我的解释有误吗?如果我把它设为单尾分布,那么它的值越大,它们来自同一分布的可能性就越大吗?