二分类问题
对于二分类问题,原论文中使用的对数损失函数:
L
(
y
,
F
)
=
l
o
g
(
1
+
e
x
p
(
−
2
y
F
)
)
,
y
∈
−
1
,
1
L(y,F) = log(1+exp(-2yF)),y \in -1,1
L(y,F)=log(1+exp(−2yF)),y∈−1,1
其中
F
(
x
)
=
1
2
l
o
g
[
P
r
(
y
=
1
∣
x
)
P
r
(
y
=
−
1
∣
x
)
]
F(x) = \frac{1}{2}log \left[\frac{Pr(y=1|x)}{Pr(y=-1|x)} \right]
F(x)=21log[Pr(y=−1∣x)Pr(y=1∣x)]
那么按照上面的算法一步步进行计算,首先计算负梯度
y
~
i
=
−
[
∂
L
(
y
,
F
(
x
i
)
)
∂
F
(
x
i
)
]
F
(
x
)
=
F
m
−
1
(
x
)
=
2
y
i
1
+
exp
(
2
y
i
F
m
−
1
(
x
i
)
)
\tilde{y}_{i}=-\left[\frac{\partial L\left(y, F\left(x_{i}\right)\right)}{\partial F\left(x_{i}\right)}\right]_{F(x)=F_{m-1}(x)}=\frac{2 y_{i}}{1+\exp \left(2 y_{i} F_{m-1}\left(x_{i}\right)\right)}
y~i=−[∂F(xi)∂L(y,F(xi))]F(x)=Fm−1(x)=1+exp(2yiFm−1(xi))2yi
然后估计叶子节点的值
γ
j
m
=
argmin
γ
∑
x
i
∈
R
m
log
(
1
+
exp
(
−
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
)
\gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{m}} \log \left(1+\exp \left(-2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)\right)
γjm=argminγxi∈Rm∑log(1+exp(−2yi(Fm−1(xi)+γ)))
原论文中,直接使用Newton-Raphson方法得出近似结果,
γ
j
m
=
∑
x
i
∈
R
m
y
~
i
∑
x
i
∈
R
m
∣
y
~
i
∣
(
2
−
∣
y
~
i
∣
)
\gamma_{j m}=\frac{\sum_{x_{i} \in R_{m}} \tilde{y}_{i}}{\sum_{x_{i} \in R_{m}}\left|\tilde{y}_{i}\right|\left(2-\left|\tilde{y}_{i}\right|\right)}
γjm=∑xi∈Rm∣y~i∣(2−∣y~i∣)∑xi∈Rmy~i
初始值如何设置
在梯度提升树算法中,我们知道,初始值的设置是:
F
o
(
x
)
=
a
r
g
m
i
n
∑
i
=
1
N
L
(
y
i
,
F
(
x
i
)
)
F_o(x) = argmin \sum_{i=1}^N L(y_i,F(x_i))
Fo(x)=argmini=1∑NL(yi,F(xi))
我们让损失函数L对F求偏导,并令偏导为0,求极值
∂
∑
i
=
1
N
L
(
y
i
,
F
(
x
i
)
)
∂
F
=
0
∑
i
=
1
N
(
−
2
y
i
)
e
−
2
y
i
F
e
−
2
y
i
F
+
1
=
0
\begin{aligned} &\frac{\partial \sum_{i=1}^{N} L\left(y_{i}, F\left(x_{i}\right)\right)}{\partial F}=0\\ &\sum_{i=1}^{N} \frac{\left(-2 y_{i}\right) e^{-2 y_{i} F}}{e^{-2 y_{i} F}+1}=0 \end{aligned}
∂F∂∑i=1NL(yi,F(xi))=0i=1∑Ne−2yiF+1(−2yi)e−2yiF=0
由于是二分类,所以yi的取值是1和-1,所以有
∑
i
:
y
i
=
1
2
e
−
2
F
e
−
2
F
+
1
+
∑
i
:
y
i
=
−
1
−
2
e
2
F
e
2
F
+
1
=
0
\sum_{i:y_i=1} \frac{2e^{-2F}}{e^{-2F}+1} + \sum_{i:y_i=-1} \frac{-2e^{2F}}{e^{2F}+1} = 0
i:yi=1∑e−2F+12e−2F+i:yi=−1∑e2F+1−2e2F=0
将分母处理成一致:
\sum_{i:y_i=1} \frac{2}{e^{2F}+1} + \sum_{i:y_i=-1} \frac{-2e{2F}}{e{2F}+1} = 0
设正样本数量为m个,负样本数量为n个,则有:
m
−
n
e
2
F
=
0
m-ne^{2F} = 0
m−ne2F=0
e
2
F
=
m
n
=
1
+
m
−
n
m
+
n
1
−
m
−
n
m
+
n
=
1
+
y
ˉ
1
−
y
ˉ
e^{2F} = \frac{m}{n} = \frac{1+\frac{m-n}{m+n}}{1-\frac{m-n}{m+n}} = \frac{1+\bar{y}}{1-\bar{y}}
e2F=nm=1−m+nm−n1+m+nm−n=1−yˉ1+yˉ
m+n表示样本总数,m-n表示yi求和
最终可以得出
F
o
(
X
)
=
1
2
l
o
g
1
+
y
ˉ
1
−
y
ˉ
F_o(X) = \frac{1}{2}log \frac{1+\bar{y}}{1-\bar{y}}
Fo(X)=21log1−yˉ1+yˉ
牛顿近似法求解
如何将公式1转化为公式2
γ
j
m
=
argmin
γ
∑
x
i
∈
R
m
log
(
1
+
exp
(
−
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
)
\gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{m}} \log \left(1+\exp \left(-2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)\right)
γjm=argminγxi∈Rm∑log(1+exp(−2yi(Fm−1(xi)+γ)))
γ
j
m
=
∑
x
i
∈
R
m
y
~
i
∑
x
i
∈
R
m
∣
y
~
i
∣
(
2
−
∣
y
~
i
∣
)
\gamma_{j m}=\frac{\sum_{x_{i} \in R_{m}} \tilde{y}_{i}}{\sum_{x_{i} \in R_{m}}\left|\tilde{y}_{i}\right|\left(2-\left|\tilde{y}_{i}\right|\right)}
γjm=∑xi∈Rm∣y~i∣(2−∣y~i∣)∑xi∈Rmy~i
首先,牛顿法是一种迭代求解的方法,论文中提到进一步迭代,我们首先令:
g
(
γ
)
=
∑
x
i
∈
R
j
m
l
o
g
(
1
+
e
x
p
(
−
2
y
i
(
F
m
−
1
(
x
i
+
γ
)
)
)
)
g(\gamma) = \sum_{x_i \in R_{jm}} log (1+exp(-2y_i(F_{m-1}(x_i+\gamma))))
g(γ)=xi∈Rjm∑log(1+exp(−2yi(Fm−1(xi+γ))))
然后使用牛顿法求解
γ
0
=
0
\gamma_0 = 0
γ0=0开始迭代
γ
j
m
=
γ
0
−
g
′
(
γ
0
)
g
′
′
(
γ
0
)
=
−
g
′
(
γ
0
)
g
′
′
(
γ
0
)
\gamma_{j m}=\gamma_{0}-\frac{g^{\prime}\left(\gamma_{0}\right)}{g^{\prime \prime}\left(\gamma_{0}\right)}=-\frac{g^{\prime}\left(\gamma_{0}\right)}{g^{\prime \prime}\left(\gamma_{0}\right)}
γjm=γ0−g′′(γ0)g′(γ0)=−g′′(γ0)g′(γ0)
然后分别对
γ
\gamma
γ进行一阶求导和二阶求导
g
′
(
γ
)
=
∑
x
i
∈
R
j
m
−
2
y
i
1
+
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
g^{\prime}(\gamma)=\sum_{x_{i} \in R_{j m}} \frac{-2 y_{i}}{1+\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)}
g′(γ)=xi∈Rjm∑1+exp(2yi(Fm−1(xi)+γ))−2yi
g
′
′
(
γ
)
=
∑
x
i
∈
R
j
m
4
y
i
2
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
[
1
+
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
]
2
=
∑
x
i
∈
R
j
m
4
y
i
2
(
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
+
1
)
−
4
y
i
2
[
1
+
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
]
2
g^{\prime \prime}(\gamma)=\sum_{x_{i} \in R_{j m}} \frac{4 y_{i}^{2} \exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)}{\left[1+\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)\right]^{2}}=\sum_{x_{i} \in R_{jm}} \frac{4 y_{i}^{2}\left(\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)+1\right)-4 y_{i}^{2}}{\left[1+\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)\right]^{2}}
g′′(γ)=xi∈Rjm∑[1+exp(2yi(Fm−1(xi)+γ))]24yi2exp(2yi(Fm−1(xi)+γ))=xi∈Rjm∑[1+exp(2yi(Fm−1(xi)+γ))]24yi2(exp(2yi(Fm−1(xi)+γ))+1)−4yi2
然后由于
y
~
i
=
−
[
∂
L
(
y
,
F
(
x
i
)
)
∂
F
(
x
i
)
]
F
(
x
)
=
F
m
−
1
(
x
)
=
2
y
i
1
+
exp
(
2
y
i
F
m
−
1
(
x
i
)
)
\tilde{y}_{i}=-\left[\frac{\partial L\left(y, F\left(x_{i}\right)\right)}{\partial F\left(x_{i}\right)}\right]_{F(x)=F_{m-1}(x)}=\frac{2 y_{i}}{1+\exp \left(2 y_{i} F_{m-1}\left(x_{i}\right)\right)}
y~i=−[∂F(xi)∂L(y,F(xi))]F(x)=Fm−1(x)=1+exp(2yiFm−1(xi))2yi
所以可以近似的得出
g
′
(
γ
)
=
∑
x
i
∈
R
j
m
−
2
y
i
1
+
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
=
−
y
~
i
g^{\prime}(\gamma)=\sum_{x_{i} \in R_{j m}} \frac{-2 y_{i}}{1+\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)} = -\tilde{y}_{i}
g′(γ)=xi∈Rjm∑1+exp(2yi(Fm−1(xi)+γ))−2yi=−y~i
g
′
′
(
γ
)
=
∑
x
i
∈
R
j
m
4
y
i
2
(
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
+
1
)
−
4
y
i
2
[
1
+
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
]
2
g^{\prime \prime}(\gamma)=\sum_{x_{i} \in R_{j m}} \frac{4 y_{i}^{2} (\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)+1)-4y_i^2}{\left[1+\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)\right]^{2}}
g′′(γ)=xi∈Rjm∑[1+exp(2yi(Fm−1(xi)+γ))]24yi2(exp(2yi(Fm−1(xi)+γ))+1)−4yi2
=
∑
x
i
∈
R
j
m
[
2
∗
2
y
i
2
[
1
+
exp
(
2
y
i
(
F
m
−
1
(
x
i
)
+
γ
)
)
]
−
y
i
2
~
]
=\sum_{x_{i} \in R_{jm}}\left[ \frac{2*2y_i^2}{\left[1+\exp \left(2 y_{i}\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)\right]} -\tilde{y_i^2}\right]
=xi∈Rjm∑[[1+exp(2yi(Fm−1(xi)+γ))]2∗2yi2−yi2~]
由于yi取值为+1或者-1,所以
y
i
2
=
∣
y
i
∣
y_i^2 = |y_i|
yi2=∣yi∣,所以有:
g
′
′
(
γ
)
=
∣
y
i
~
∣
(
2
−
∣
y
i
~
∣
)
g^{\prime \prime}(\gamma) = |\tilde{y_i}|(2-|\tilde{y_i}|)
g′′(γ)=∣yi~∣(2−∣yi~∣)
二分类问题
最终我们求出F(x),那么如何使用它进行分类呢:
F
(
x
)
=
1
2
l
o
g
(
p
1
−
p
)
F(x) = \frac{1}{2}log \left(\frac{p}{1-p} \right)
F(x)=21log(1−pp)
稍微进行转化可得
e
2
F
(
x
)
=
p
1
−
p
e^{2F(x)} = \frac{p}{1-p}
e2F(x)=1−pp
进一步转换可得
P
+
(
x
)
=
p
=
e
2
F
(
x
)
1
+
e
2
F
(
x
)
=
1
1
+
e
−
2
F
(
x
)
P_{+}(x) = p = \frac{e^{2F(x)}}{1+e^{2F(x)}} = \frac{1}{1+e^{-2F(x)}}
P+(x)=p=1+e2F(x)e2F(x)=1+e−2F(x)1
P
−
(
x
)
=
1
−
p
=
1
1
+
e
2
F
(
x
)
P_{-}(x) = 1-p = \frac{1}{1+e^{2F(x)}}
P−(x)=1−p=1+e2F(x)1
最终实现二分类