auc 的概率解释

关于 auc 是什么、怎么算,已经有很多博文在将了。
如果有时间再写。

今天写一下 auc 统计学意义的证明,也就是:auc 等价于随机抽取一个正样本和一个负样本,正样本排在负样本之前的概率

The AUC has an important statistical property: the AUC of a classifier is equivalent to the probability that
the classifier will rank a randomly chosen positive instance
higher than a randomly chosen negative instance.

假设 X X X 是样本空间, T ( t ) T(t) T(t) T P R TPR TPR 关于阈值 t t t 的函数,那么:
T ( t ) = P [ p ( x ) > t ∣ l a b e l ( x ) = 1 ] T(t) = P[p(x) > t | label(x) = 1] T(t)=P[p(x)>tlabel(x)=1]
F ( t ) F(t) F(t) F P R FPR FPR 关于阈值 t t t 的函数,那么:
F ( t ) = P [ p ( x ) > t ∣ l a b e l ( x ) = 0 ] F(t) = P[p(x) > t | label(x) = 0]\\ F(t)=P[p(x)>tlabel(x)=0]
F ( t ) F(t) F(t) 的概率密度函数:
f ( t ) = ∂ F ( t ) ∂ t = P [ p ( x ) = t ∣ l a b e l ( x ) = 0 ] f(t) = \frac{\partial F(t)}{\partial t} = P[p(x) = t | label(x) = 0] f(t)=tF(t)=P[p(x)=tlabel(x)=0]

如果,我们把 T ( t ) T(t) T(t) 看做 F ( t ) F(t) F(t) 的函数,那么根据 auc 的定义,我们可以做如下推论:
A U C = ∫ 0 1 T ( t ) d F ( t ) = ∫ 0 1 T ( t ) ⋅ ∂ F ( t ) ∂ t d t = ∫ 0 1 P [ p ( x ) > t ∣ l a b e l ( x ) = 1 ] ⋅ P [ p ( x ′ ) = t ∣ l a b e l ( x ′ ) = 0 ] d t ( ∀ x , x ′ ∈ X ) = ∫ 0 1 P [ p ( x ) > t & p ( x ′ ) = t ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] d t = ∫ 0 1 P [ p ( x ) > p ( x ′ ) ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] d t = P [ p ( x ) > p ( x ′ ) ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] ⋅ 1 − P [ p ( x ) > p ( x ′ ) ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] ⋅ 0 = P [ p ( x ) > p ( x ′ ) ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] AUC = \int_0^1 T(t) dF(t) \\ = \int_0^1 T(t) \cdot \frac{\partial F(t)}{\partial t} dt \\ = \int_0^1 P[p(x) > t | label(x) = 1] \cdot P[p(x') = t | label(x') = 0] dt (\forall x, x' \in X) \\ = \int_0^1 P[p(x) > t \& p(x') = t | label(x) = 1 \& label(x') = 0 ] dt \\ = \int_0^1 P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] dt \\ = P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] \cdot 1 - P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] \cdot 0 \\ = P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] AUC=01T(t)dF(t)=01T(t)tF(t)dt=01P[p(x)>tlabel(x)=1]P[p(x)=tlabel(x)=0]dt(x,xX)=01P[p(x)>t&p(x)=tlabel(x)=1&label(x)=0]dt=01P[p(x)>p(x)label(x)=1&label(x)=0]dt=P[p(x)>p(x)label(x)=1&label(x)=0]1P[p(x)>p(x)label(x)=1&label(x)=0]0=P[p(x)>p(x)label(x)=1&label(x)=0]

P [ p ( x ) > p ( x ′ ) ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] P[p(x)>p(x)label(x)=1&label(x)=0] 表示随机抽取一个正样本和一个负样本,正样本排在负样本之前的概率,得证。

参考:
https://www.alexejgossmann.com/auc/

An introduction to ROC analysis

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值