Wilcoxon ci_lower_bound 排序

最新推荐文章于 2022-01-24 21:02:59 发布

genghaihua

最新推荐文章于 2022-01-24 21:02:59 发布

阅读量262

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/genghaihua/article/details/89921289

版权

机器学习专栏收录该内容

37 篇文章 8 订阅

订阅专栏

PROBLEM: You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of “score” to sort by.

WRONG SOLUTION #1: score=正面评价数-负面评价数

A: 600 正面评价+400负面评价总计1000, 60%的正面评价比例 ,差值200

B:5,500 positive ratings and 4,500 negative ratings 55% positive,差值= 1000

WRONG SOLUTION #2: Score = Average rating = (Positive ratings) / (Total ratings)

Why it is wrong: Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:

def ci_lower_bound(pos, n, confidence)
    if n == 0
        return 0
    end
    z = Statistics2.pnormaldist(1-(1-confidence)/2)
    phat = 1.0*pos/n
    (phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

genghaihua

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Wilcoxon ci_lower_bound 排序

PROBLEM: You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of “score” ...
复制链接

扫一扫

专栏目录