I want to calculate conditional probabilites of ratings('A','B','C') in ratings column.
company model rating type
0 ford mustang A coupe
1 chevy camaro B coupe
2 ford fiesta C sedan
3 ford focus A sedan
4 ford taurus B sedan
5 toyota camry B sedan
Output:
Prob(rating=A) = 0.333333
Prob(rating=B) = 0.500000
Prob(rating=C) = 0.166667
Prob(type=coupe|rating=A) = 0.500000
Prob(type=sedan|rating=A) = 0.500000
Prob(type=coupe|rating=B) = 0.333333
Prob(type=sedan|rating=B) = 0.666667
Prob(type=coupe|rating=C) = 0.000000
Prob(type=sedan|rating=C) = 1.000000
Any help, Thanks..!!
解决方案
You can use .groupby() and the built-in .div():
rating_probs = df.groupby('rating').size().div(len(df))
rating
A 0.333333
B 0.500000
C 0.166667
and the conditional probs:
df.groupby(['type', 'rating']).size().div(len(df)).div(rating_probs, axis=0, level='rating')
coupe A 0.500000
B 0.333333
sedan A 0.500000
B 0.666667
C 1.000000