关于 auc 是什么、怎么算,已经有很多博文在将了。
如果有时间再写。
今天写一下 auc 统计学意义的证明,也就是:auc 等价于随机抽取一个正样本和一个负样本,正样本排在负样本之前的概率。
The AUC has an important statistical property: the AUC of a classifier is equivalent to the probability that
the classifier will rank a randomly chosen positive instance
higher than a randomly chosen negative instance.
假设
X
X
X 是样本空间,
T
(
t
)
T(t)
T(t) 是
T
P
R
TPR
TPR 关于阈值
t
t
t 的函数,那么:
T
(
t
)
=
P
[
p
(
x
)
>
t
∣
l
a
b
e
l
(
x
)
=
1
]
T(t) = P[p(x) > t | label(x) = 1]
T(t)=P[p(x)>t∣label(x)=1]
F
(
t
)
F(t)
F(t) 是
F
P
R
FPR
FPR 关于阈值
t
t
t 的函数,那么:
F
(
t
)
=
P
[
p
(
x
)
>
t
∣
l
a
b
e
l
(
x
)
=
0
]
F(t) = P[p(x) > t | label(x) = 0]\\
F(t)=P[p(x)>t∣label(x)=0]
F
(
t
)
F(t)
F(t) 的概率密度函数:
f
(
t
)
=
∂
F
(
t
)
∂
t
=
P
[
p
(
x
)
=
t
∣
l
a
b
e
l
(
x
)
=
0
]
f(t) = \frac{\partial F(t)}{\partial t} = P[p(x) = t | label(x) = 0]
f(t)=∂t∂F(t)=P[p(x)=t∣label(x)=0]
如果,我们把
T
(
t
)
T(t)
T(t) 看做
F
(
t
)
F(t)
F(t) 的函数,那么根据 auc 的定义,我们可以做如下推论:
A
U
C
=
∫
0
1
T
(
t
)
d
F
(
t
)
=
∫
0
1
T
(
t
)
⋅
∂
F
(
t
)
∂
t
d
t
=
∫
0
1
P
[
p
(
x
)
>
t
∣
l
a
b
e
l
(
x
)
=
1
]
⋅
P
[
p
(
x
′
)
=
t
∣
l
a
b
e
l
(
x
′
)
=
0
]
d
t
(
∀
x
,
x
′
∈
X
)
=
∫
0
1
P
[
p
(
x
)
>
t
&
p
(
x
′
)
=
t
∣
l
a
b
e
l
(
x
)
=
1
&
l
a
b
e
l
(
x
′
)
=
0
]
d
t
=
∫
0
1
P
[
p
(
x
)
>
p
(
x
′
)
∣
l
a
b
e
l
(
x
)
=
1
&
l
a
b
e
l
(
x
′
)
=
0
]
d
t
=
P
[
p
(
x
)
>
p
(
x
′
)
∣
l
a
b
e
l
(
x
)
=
1
&
l
a
b
e
l
(
x
′
)
=
0
]
⋅
1
−
P
[
p
(
x
)
>
p
(
x
′
)
∣
l
a
b
e
l
(
x
)
=
1
&
l
a
b
e
l
(
x
′
)
=
0
]
⋅
0
=
P
[
p
(
x
)
>
p
(
x
′
)
∣
l
a
b
e
l
(
x
)
=
1
&
l
a
b
e
l
(
x
′
)
=
0
]
AUC = \int_0^1 T(t) dF(t) \\ = \int_0^1 T(t) \cdot \frac{\partial F(t)}{\partial t} dt \\ = \int_0^1 P[p(x) > t | label(x) = 1] \cdot P[p(x') = t | label(x') = 0] dt (\forall x, x' \in X) \\ = \int_0^1 P[p(x) > t \& p(x') = t | label(x) = 1 \& label(x') = 0 ] dt \\ = \int_0^1 P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] dt \\ = P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] \cdot 1 - P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] \cdot 0 \\ = P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ]
AUC=∫01T(t)dF(t)=∫01T(t)⋅∂t∂F(t)dt=∫01P[p(x)>t∣label(x)=1]⋅P[p(x′)=t∣label(x′)=0]dt(∀x,x′∈X)=∫01P[p(x)>t&p(x′)=t∣label(x)=1&label(x′)=0]dt=∫01P[p(x)>p(x′)∣label(x)=1&label(x′)=0]dt=P[p(x)>p(x′)∣label(x)=1&label(x′)=0]⋅1−P[p(x)>p(x′)∣label(x)=1&label(x′)=0]⋅0=P[p(x)>p(x′)∣label(x)=1&label(x′)=0]
P [ p ( x ) > p ( x ′ ) ∣ l a b e l ( x ) = 1 & l a b e l ( x ′ ) = 0 ] P[p(x) > p(x') | label(x) = 1 \& label(x') = 0 ] P[p(x)>p(x′)∣label(x)=1&label(x′)=0] 表示随机抽取一个正样本和一个负样本,正样本排在负样本之前的概率,得证。
参考:
https://www.alexejgossmann.com/auc/