白人和黑人在求职路上会有种族的歧视吗?
import pandas as pd
import numpy as np
from scipy import stats
data = pd.io.stata.read_stata('us_job_market_discrimination.dta')
data.head()
blacks = data[data.race == 'b']
whites = data[data.race == 'w']
black的数据:
whites.call.describe()
blacks.call.describe()
count 2435.000000
mean 0.064476
std 0.245649
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 1.000000
Name: call, dtype: float64
white的数据描述:
whites.call.describe()
count 2435.000000
mean 0.096509
std 0.295346
min 0.000000
25% 0.000000
50% 0.000000
75% 0.000000
max 1.000000
Name: call, dtype: float64
卡方检验
- 白人获得职位
- 白人被拒绝
- 黑人获得职位
- 黑人被拒绝
假设检验
- H0:种族对求职结果没有显著影响
- H1:种族对求职结果有影响
blacks_called = len(blacks[blacks['call'] == True])#黑人获得职位 blacks_not_called = len(blacks[blacks['call'] == False])#黑人被拒绝 whites_called = len(whites[whites['call'] == True])#白人获得职位 whites_not_called = len(whites[whites['call'] == False])#白人被拒绝
observed = pd.DataFrame({'blacks': {'called': blacks_called, 'not_called': blacks_not_called}, 'whites': {'called' : whites_called, 'not_called' : whites_not_called}}) observed
num_called_back = blacks_called + whites_called#获得职位总数 num_not_called = blacks_not_called + whites_not_called#没有获得职位的总数 print(num_called_back) print(num_not_called)
392 4478
rate_of_callbacks = num_called_back / (num_not_called + num_called_back) rate_of_callbacks
0.08049281314168377
expected_called = len(data) * rate_of_callbacks expected_not_called = len(data) * (1 - rate_of_callbacks) print(expected_called) print(expected_not_called)
391.99999999999994 4478.0
import scipy.stats as stats #观测值 observed_frequencies = [blacks_not_called, whites_not_called, whites_called, blacks_called] #期望值 expected_frequencies = [expected_not_called/2, expected_not_called/2, expected_called/2, expected_called/2] #卡方检验 stats.chisquare(f_obs = observed_frequencies, f_exp = expected_frequencies)
Power_divergenceResult(statistic=16.879050414270221, pvalue=0.00074839594410972638)
p值小于0.05,拒绝假设H0:种族对求职结果没有显著影响。