微信公众号:pythonEducation
模型和统计项目QQ:231469242
目录:
1.Shapiro-Wilk test
样本量小于50
2.normaltest
样本量小于50, normaltest运用了D’Agostino–Pearson综合测试法,每组样本数大于20
3.Lilliefors-test
- for intermediate sample numbers, the Lilliefors-test is good since the original Kolmogorov-Smirnov-test is unreliable when mean and std of the distribution are not known.
4.Kolmogorov-Smirnov(Kolmogorov-Smirnov) test
- the Kolmogorov-Smirnov(Kolmogorov-Smirnov) test should only be used for large sample numbers (>300)
最新版本代码
# -*- coding: utf-8 -*-
'''
Author:Toby
QQ:231469242,all right reversed,no commercial use
微信公众号:pythonEducation
'''
import scipy
from scipy.stats import f
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
# additional packages
from statsmodels.stats.diagnostic import lillifors
group1=[2,3,7,2,6]
group2=[10,8,7,5,10]
group3=[10,13,14,13,15]
list_groups=[group1,group2,group3]
list_total=group1+group2+group3
#正态分布测试
def check_normality(testData):
#20
if 20
p_value= stats.normaltest(testData)[1]
if p_value<0.05:
print"use normaltest"
print "data are not normal distributed"
return False
else:
print"use normaltest"
print "data are normal distributed"
return True
#样本数小于50用Shapiro-Wilk算法检验正态分布性
if len(testData) <50:
p_value= stats.shapiro(testData)[1]
if p_value<0.05:
print "use shapiro:"
print "data are not normal distributed"
return False
else:
print "use shapiro:"
print "data are normal distributed"
return True
if 300>=len(testData) >=50:
p_value= lillifors(testData)[1]
if p_value<0.05:
print "use lillifors:"
print "data are not normal distributed"
return False
else:
print "use lillifors:"
print "data are normal distributed"
return True
if len(testData) >300:
p_value= stats.kstest(testData,'norm')[1]
if p_value<0.05:
print "use kstest:"
print "data are not normal distributed"
return False
else:
print "use kstest:"
print "data are normal distributed"
return True
#对所有样本组进行正态性检验
def NormalTest(list_groups):
for group in list_groups:
#正态性检验
status=check_normality(group1)
if status==False :
return False
#对所有样本组进行正态性检验
NormalTest(list_groups)
pp-plot和qq-plot结论都很类似。如果数据服从正太分布,生成的点会很好依附在y=x直线上
In all three cases the results are similar: if the two distributions being compared
are similar, the points will approximately lie on the line y D x. If the distributions
are linearly related, the points will approximately lie on a line, but not necessarily
on the line y D x (Fig. 7.1).
In Python, a probability plot can be generated with the command
stats.probplot(data, plot=plt)
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.probplot.html
a) Probability-Plots
用于可视化评估分布,绘制分位点来比较概率分布
sample quantilies是你的样本原始的数据
sample distribution
In statistics different tools are available for the visual assessments of distributions.
A number of graphical methods exist for comparing two probability distributions by plotting their quantiles, or closely related parameters, against each other:
# -*- coding: utf-8 -*-
import numpy as np
import pylab
import scipy.stats as stats
measurements = np.random.normal(loc = 20, scale = 5, size=100)
stats.probplot(measurements, dist="norm", plot=pylab)
pylab.show()
7.1 Probability-plot, to
check for normality of a
由于随机产生的100个正态分布点,测试其正太性。概率图显示100个点很好落在y=x直线附近,所以这些数据有很好正态性。
QQPlot(quantile quantile plot)
http://baike.baidu.com/link?url=o9Z7vr6VdvGAtTRO3RYxQbVu56U_XDaSdibPeVcidMJQ7B6LcAUBHcIro4tLf5BSI5Pu-59W4SPNZ-zRFJ8_FgL3dxJLaUdY0JiB2xUmqie
QQPlot图是用于直观验证一组数据是否来自某个分布,或者验证某两组数