卡方检验实例(python)

白人和黑人在求职路上会有种族的歧视吗?

import pandas as pd
import numpy as np
from scipy import stats
data = pd.io.stata.read_stata('us_job_market_discrimination.dta')
data.head()
blacks = data[data.race == 'b']
whites = data[data.race == 'w']

black的数据: 

whites.call.describe()
blacks.call.describe()
count    2435.000000
mean        0.064476
std         0.245649
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: call, dtype: float64

white的数据描述:

whites.call.describe()
count    2435.000000
mean        0.096509
std         0.295346
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: call, dtype: float64

卡方检验

  • 白人获得职位
  • 白人被拒绝
  • 黑人获得职位
  • 黑人被拒绝

假设检验

  • H0:种族对求职结果没有显著影响
  • H1:种族对求职结果有影响
blacks_called = len(blacks[blacks['call'] == True])#黑人获得职位
blacks_not_called = len(blacks[blacks['call'] == False])#黑人被拒绝
whites_called = len(whites[whites['call'] == True])#白人获得职位
whites_not_called = len(whites[whites['call'] == False])#白人被拒绝
observed = pd.DataFrame({'blacks': {'called': blacks_called, 'not_called': blacks_not_called},
                         'whites': {'called' : whites_called, 'not_called' : whites_not_called}})
observed

                           

num_called_back = blacks_called + whites_called#获得职位总数
num_not_called = blacks_not_called + whites_not_called#没有获得职位的总数

print(num_called_back)
print(num_not_called)
392
4478
rate_of_callbacks = num_called_back / (num_not_called + num_called_back)
rate_of_callbacks
0.08049281314168377
expected_called = len(data)  * rate_of_callbacks
expected_not_called = len(data)  * (1 - rate_of_callbacks)
print(expected_called)
print(expected_not_called)
391.99999999999994
4478.0
import scipy.stats as stats
#观测值
observed_frequencies = [blacks_not_called, whites_not_called, whites_called, blacks_called]
#期望值
expected_frequencies = [expected_not_called/2, expected_not_called/2, expected_called/2, expected_called/2]

#卡方检验
stats.chisquare(f_obs = observed_frequencies,
                f_exp = expected_frequencies)
Power_divergenceResult(statistic=16.879050414270221, pvalue=0.00074839594410972638)

p值小于0.05,拒绝假设H0:种族对求职结果没有显著影响。

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

长沙有肥鱼

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值