卡方 python_Python卡方拟合优度检验得到最优分布

给定一组数据值,我试图得到描述数据的最佳理论分布。经过几天的研究,我想出了下面的python代码。在import numpy as np

import csv

import pandas as pd

import scipy.stats as st

import math

import sys

import matplotlib

matplotlib.use('Agg')

import matplotlib.pyplot as plt

def fit_to_all_distributions(data):

dist_names = ['fatiguelife', 'invgauss', 'johnsonsu', 'johnsonsb', 'lognorm', 'norminvgauss', 'powerlognorm', 'exponweib','genextreme', 'pareto']

params = {}

for dist_name in dist_names:

try:

dist = getattr(st, dist_name)

param = dist.fit(data)

params[dist_name] = param

except Exception:

print("Error occurred in fitting")

params[dist_name] = "Error"

return params

def get_best_distribution_using_chisquared_test(data, params):

histo, bin_edges = np.histogram(data, bins='auto', normed=False)

number_of_bins = len(bin_edges) - 1

observed_values = histo

dist_names = ['fatiguelife', 'invgauss', 'johnsonsu', 'johnsonsb', 'lognorm', 'norminvgauss', 'powerlognorm', 'exponweib','genextreme', 'pareto']

dist_results = []

for dist_name in dist_names:

param = params[dist_name]

if (param != "Error"):

# Applying the SSE test

arg = param[:-2]

loc = param[-2]

scale = param[-1]

cdf = getattr(st, dist_name).cdf(bin_edges, loc=loc, scale=scale, *arg)

expected_values = len(data) * np.diff(cdf)

c , p = st.chisquare(observed_values, expected_values, ddof=number_of_bins-len(param))

dist_results.append([dist_name, c, p])

# select the best fitted distribution

best_dist, best_c, best_p = None, sys.maxsize, 0

for item in dist_results:

name = item[0]

c = item[1]

p = item[2]

if (not math.isnan(c)):

if (c < best_c):

best_c = c

best_dist = name

best_p = p

# print the name of the best fit and its p value

print("Best fitting distribution: " + str(best_dist))

print("Best c value: " + str(best_c))

print("Best p value: " + str(best_p))

print("Parameters for the best fit: " + str(params[best_dist]))

return best_dist, best_c, params[best_dist], dist_results

然后我测试这个代码

^{pr2}$

由于数据点是使用Pareto分布生成的,它应该返回Pareto作为具有足够大p值(p&gt;0.05)的最佳拟合分布。在

但这就是我得到的输出。在Best fitting distribution: genextreme

Best c value: 106.46087793622216

Best p value: 7.626303538461713e-24

Parameters for the best fit: (-0.7664124294696955, 2.3217378846757164, 0.3711562696710188)

我的卡平方拟合优度测试的实现有什么问题吗?在

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值