matlab odds,calculating odds ratio in python

最新推荐文章于 2023-06-08 10:37:05 发布

白白前

最新推荐文章于 2023-06-08 10:37:05 发布

阅读量445

点赞数

文章标签： matlab odds

Checking online in here and here I see there are two ways to estimate odds ratio in python but the results are different.

First way:

importscipy.statsasstatsimportpandasaspd

df=pd.DataFrame({'c':['m','m','m','m','f','f','f','f'],'l':[1,1,1,0,0,0,0,1]})ct=pd.crosstab(df.c,df.l)oddsratio,pvalue=stats.fisher_exact(ct)

Second way:

fromsklearn.linear_modelimportLogisticRegressiondf=pd.get_dummies(df,drop_first=True)clf=LogisticRegression()clf.fit(df[['c_m']],df[['l']].values)odds_ratio=np.exp(clf.coef_)

First approach return odds ratio=9 and second approach returns odds ratio=1.9.

I am relatively new to the concept of odds ratio and I am not sure how fisher test and logistic regression could be used to obtain the same value, what is the difference and which method is correct approach to get the odds ratio in this case.

I would appreciate any hint. thanks.

解决方案

Short answer:

In both cases, you should get the same odds ratio of 9.

By default, penality is 'L2' in sklearn logistic regression model which distorts the value of coefficients (regularization), so if you use penality='none, you will get the same matching odds ratio.

so change to

clf=LogisticRegression(penalty='none')

and calculate the odds_ratio

Long Answer:

In the first case, Odd's ratio is the prior odds ratio and is made from the contingency/crosstabulation table and is calculated as shown below

Contingency table for the df would be

l01c

f31m13

odds ratio = odds of f being 0 / odds of m being 0

odds of f being 0 = P(f=0)/P(f=1) = (3/4) / (1/4)

odds of m being 0 = P(m=0)/P(m=1) = (1/4) / (3/4)

odds ratio = ((3/4)/(1/4)) / ((1/4)/(3/4)) = 9

In the second case, you are getting the estimate of odds ratio by fitting logistic regression model. You will get odds ratio = 9 if you use penality = 'none'. By default, penality in logisticregression estimator is 'L2'.

fromsklearn.linear_modelimportLogisticRegressiondf=pd.get_dummies(df,drop_first=True)clf=LogisticRegression(penalty='none')clf.fit(df[['c_m']],df[['l']].values)odds_ratio=np.exp(clf.coef_)print(odd_ratio)array([[9.0004094]])

You can also get odds ratio by another method, which also results in same odds ratio. see

#Method 2:odds_of_yis_1_for_female=np.exp(clf.intercept_+clf.coef_*1)#logit for femaleodds_of_yis_1_for_male=np.exp(clf.intercept_+clf.coef_*0)# logit for maleodds_ratio_2=odds_of_yis_1_for_female/odds_of_yis_1_for_maleprint(odds_ratio_2)[[9.0004094]]

To understand why both methods are same, see here

白白前

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
matlab odds,calculating odds ratio in python

Checking online in here and here I see there are two ways to estimate odds ratio in python but the results are different.First way:importscipy.statsasstatsimportpandasaspddf=pd.DataFrame({'c':['m','m'...
复制链接

扫一扫