多分类问题
对于多分类问题,与二分类问题类似,仅在损失函数部分有所区别,对于多分类问题,原论文中选择的是交叉熵损失函数
L
(
{
y
k
,
F
k
(
x
)
}
1
K
)
=
−
∑
k
=
1
K
y
k
log
p
k
(
x
)
L\left(\left\{y_{k}, F_{k}(x)\right\}_{1}^{K}\right)=-\sum_{k=1}^{K} y_{k} \log p_{k}(x)
L({yk,Fk(x)}1K)=−k=1∑Kyklogpk(x)
同时多分类问题一般使用softmax 函数进行类别概率的计算,其中k表示当前的类别,K表示类别的数量
p
k
(
x
)
=
exp
(
F
k
(
x
)
)
/
∑
l
=
1
K
exp
(
F
l
(
x
)
)
p_{k}(x)=\exp \left(F_{k}(x)\right) / \sum_{l=1}^{K} \exp \left(F_{l}(x)\right)
pk(x)=exp(Fk(x))/l=1∑Kexp(Fl(x))
然后同样的求负梯度(残差)
y
~
i
k
=
−
[
∂
L
(
{
y
i
l
,
F
l
(
x
i
)
}
l
=
1
K
)
∂
F
k
(
x
i
)
]
{
F
l
(
x
)
=
F
l
m
−
1
(
α
)
}
1
K
=
y
i
k
−
p
k
,
m
−
1
(
x
i
)
\tilde{y}_{i k}=-\left[\frac{\partial L\left(\left\{y_{i l}, F_{l}\left(x_{i}\right)\right\}_{l=1}^{K}\right)}{\partial F_{k}\left(x_{i}\right)}\right]_{\left\{F_{l}(x)=F_{l m-1(\alpha)}\right\}_{1}^{K}}=y_{i k}-p_{k, m-1\left(x_{i}\right)}
y~ik=−⎣⎡∂Fk(xi)∂L({yil,Fl(xi)}l=1K)⎦⎤{Fl(x)=Flm−1(α)}1K=yik−pk,m−1(xi)
我们需要求叶子节点的估计值
{
r
j
k
m
}
=
argmin
γ
k
∑
i
=
1
N
∑
k
=
1
K
ϕ
(
y
i
k
,
F
k
,
m
−
1
(
x
i
)
+
∑
j
=
1
J
γ
j
k
I
(
x
i
∈
R
j
m
)
}
)
\left.\left\{r_{j k m}\right\}=\operatorname{argmin}_{\gamma_{k}} \sum_{i=1}^{N} \sum_{k=1}^{K} \phi\left(y_{i k}, F_{k, m-1}\left(x_{i}\right)+\sum_{j=1}^{J} \gamma_{j k} I\left(x_{i} \in R_{j m}\right)\right\}\right)
{rjkm}=argminγki=1∑Nk=1∑Kϕ(yik,Fk,m−1(xi)+j=1∑JγjkI(xi∈Rjm)})
可以通过Newton-Raphson来求近似结果
γ
j
k
m
=
K
−
1
K
∑
x
i
∈
R
j
k
m
y
~
i
k
∑
x
i
∈
R
j
k
m
∣
y
~
i
k
∣
(
1
−
∣
y
~
i
k
∣
)
\gamma_{j k m}=\frac{K-1}{K} \frac{\sum_{x_{i} \in R_{j k m}} \tilde{y}_{i k}}{\sum_{x_{i} \in R_{j k m}}\left|\tilde{y}_{i k}\right|\left(1-\left|\tilde{y}_{i k}\right|\right)}
γjkm=KK−1∑xi∈Rjkm∣y~ik∣(1−∣y~ik∣)∑xi∈Rjkmy~ik
回归问题
在原论文中使用的是huber损失函数,为了简单,我们使用平方损失
L
(
y
,
F
)
=
(
y
−
F
)
2
2
L(y,F) = \frac{(y-F)^2}{2}
L(y,F)=2(y−F)2
y
~
i
=
−
[
∂
L
(
y
,
F
(
x
i
)
)
∂
F
(
x
i
)
]
F
(
x
)
=
F
m
−
1
(
x
)
=
y
i
−
F
m
−
1
(
x
i
)
\tilde{y}_{i}=-\left[\frac{\partial L\left(y, F\left(x_{i}\right)\right)}{\partial F\left(x_{i}\right)}\right] F(x)=F_{m-1}(x)=y_{i}-F_{m-1}\left(x_{i}\right)
y~i=−[∂F(xi)∂L(y,F(xi))]F(x)=Fm−1(x)=yi−Fm−1(xi)
叶子节点值的估计
γ
j
m
=
argmin
γ
∑
x
i
∈
R
j
m
1
2
(
y
i
−
(
F
m
−
1
(
x
i
)
+
γ
)
)
2
γ
j
m
=
argmin
γ
∑
x
i
∈
R
j
m
1
2
(
y
i
−
F
m
−
1
(
x
i
)
−
γ
)
2
γ
j
m
=
argmin
γ
∑
x
i
∈
R
j
m
1
2
(
y
~
i
−
γ
)
2
\begin{array}{c} \gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{j m}} \frac{1}{2}\left(y_{i}-\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)^{2} \\ \gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{j m}} \frac{1}{2}\left(y_{i}-F_{m-1}\left(x_{i}\right)-\gamma\right)^{2} \\ \gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{jm}} \frac{1}{2}\left(\tilde{y}_{i}-\gamma\right)^{2} \end{array}
γjm=argminγ∑xi∈Rjm21(yi−(Fm−1(xi)+γ))2γjm=argminγ∑xi∈Rjm21(yi−Fm−1(xi)−γ)2γjm=argminγ∑xi∈Rjm21(y~i−γ)2
所以,我们可以去
y
~
i
\tilde{y}_i
y~i均值,来使得损失最小:
γ
j
m
=
a
v
e
r
a
g
e
x
i
∈
R
j
m
y
~
i
\gamma_{jm} = average_{x_i \in R_{jm}} \tilde{y}_i
γjm=averagexi∈Rjmy~i