卡方检测python,Python,Pandas&卡方独立检验

I am quite new to Python as well as Statistics. I'm trying to apply the Chi Squared Test to determine whether previous success affects the level of change of a person (percentage wise, this does seem to be the case, but I wanted to see whether my results were statistically significant).

My question is: Did I do this correctly? My results say the p-value is 0.0, which means that there is a significant relationship between my variables (which is what I want of course...but 0 seems a little bit too perfect for a p-value, so I'm wondering whether I did it incorrectly coding wise).

Here's what I did:

import numpy as np

import pandas as pd

import scipy.stats as stats

d = {'Previously Successful' : pd.Series([129.3, 182.7, 312], index=['Yes - changed strategy', 'No', 'col_totals']),

'Previously Unsuccessful' : pd.Series([260.17, 711.83, 972], index=['Yes - changed strategy', 'No', 'col_totals']),

'row_totals' : pd.Series([(129.3+260.17), (182.7+711.83), (312+972)], index=['Yes - changed strategy', 'No', 'col_totals'])}

total_summarized = pd.DataFrame(d)

observed = total_summarized.ix[0:2,0:2]

expected = np.outer(total_summarized["row_totals"][0:2],

total_summarized.ix["col_totals"][0:2])/1000

expected = pd.DataFrame(expected)

expected.columns = ["Previously Successful","Previously Unsuccessful"]

expected.index = ["Yes - changed strategy","No"]

chi_squared_stat = (((observed-expected)**2)/expected).sum().sum()

print(chi_squared_stat)

crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*

df = 8) # *

print("Critical value")

print(crit)

p_value = 1 - stats.chi2.cdf(x=chi_squared_stat, # Find the p-value

df=8)

print("P value")

print(p_value)

stats.chi2_contingency(observed= observed)

解决方案

A few corrections:

Your expected array is not correct. You must divide by observed.sum().sum(), which is 1284, not 1000.

For a 2x2 contingency table such as this, the degrees of freedom is 1, not 8.

You calculation of chi_squared_stat does not include a continuity correction. (But it isn't necessarily wrong to not use it--that's a judgment call for the statistician.)

All the calculations that you perform (expected matrix, statistics, degrees of freedom, p-value) are computed by chi2_contingency:

In [65]: observed

Out[65]:

Previously Successful Previously Unsuccessful

Yes - changed strategy 129.3 260.17

No 182.7 711.83

In [66]: from scipy.stats import chi2_contingency

In [67]: chi2, p, dof, expected = chi2_contingency(observed)

In [68]: chi2

Out[68]: 23.383138325890453

In [69]: p

Out[69]: 1.3273696199438626e-06

In [70]: dof

Out[70]: 1

In [71]: expected

Out[71]:

array([[ 94.63757009, 294.83242991],

[ 217.36242991, 677.16757009]])

By default, chi2_contingency uses a continuity correction when the contingency table is 2x2. If you prefer to not use the correction, you can disable it with the argument correction=False:

In [73]: chi2, p, dof, expected = chi2_contingency(observed, correction=False)

In [74]: chi2

Out[74]: 24.072616672232893

In [75]: p

Out[75]: 9.2770200776879643e-07

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值