matlab odds,calculating odds ratio in python

Checking online in here and here I see there are two ways to estimate odds ratio in python but the results are different.

First way:

importscipy.statsasstatsimportpandasaspd

df=pd.DataFrame({'c':['m','m','m','m','f','f','f','f'],'l':[1,1,1,0,0,0,0,1]})ct=pd.crosstab(df.c,df.l)oddsratio,pvalue=stats.fisher_exact(ct)

Second way:

fromsklearn.linear_modelimportLogisticRegressiondf=pd.get_dummies(df,drop_first=True)clf=LogisticRegression()clf.fit(df[['c_m']],df[['l']].values)odds_ratio=np.exp(clf.coef_)

First approach return odds ratio=9 and second approach returns odds ratio=1.9.

I am relatively new to the concept of odds ratio and I am not sure how fisher test and logistic regression could be used to obtain the same value, what is the difference and which method is correct approach to get the odds ratio in this case.

I would appreciate any hint. thanks.

解决方案

Short answer:

In both cases, you should get the same odds ratio of 9.

By default, penality is 'L2' in sklearn logistic regression model which distorts the value of coefficients (regularization), so if you use penality='none, you will get the same matching odds ratio.

so change to

clf=LogisticRegression(penalty='none')

and calculate the odds_ratio

Long Answer:

In the first case, Odd's ratio is the prior odds ratio and is made from the contingency/crosstabulation table and is calculated as shown below

Contingency table for the df would be

l01c

f31m13

odds ratio = odds of f being 0 / odds of m being 0

odds of f being 0 = P(f=0)/P(f=1) = (3/4) / (1/4)

odds of m being 0 = P(m=0)/P(m=1) = (1/4) / (3/4)

odds ratio = ((3/4)/(1/4)) / ((1/4)/(3/4)) = 9

In the second case, you are getting the estimate of odds ratio by fitting logistic regression model. You will get odds ratio = 9 if you use penality = 'none'. By default, penality in logisticregression estimator is 'L2'.

fromsklearn.linear_modelimportLogisticRegressiondf=pd.get_dummies(df,drop_first=True)clf=LogisticRegression(penalty='none')clf.fit(df[['c_m']],df[['l']].values)odds_ratio=np.exp(clf.coef_)print(odd_ratio)array([[9.0004094]])

You can also get odds ratio by another method, which also results in same odds ratio. see

#Method 2:odds_of_yis_1_for_female=np.exp(clf.intercept_+clf.coef_*1)#logit for femaleodds_of_yis_1_for_male=np.exp(clf.intercept_+clf.coef_*0)# logit for maleodds_ratio_2=odds_of_yis_1_for_female/odds_of_yis_1_for_maleprint(odds_ratio_2)[[9.0004094]]

To understand why both methods are same, see here

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值