# GBDT原理与Sklearn源码分析-分类篇

## 正文：

GB的一些基本原理都已经在上文中介绍了，下面直接进入正题。

$L\left({y}_{i},{F}_{m}\left({x}_{i}\right)\right)=-\left\{{y}_{i}log{p}_{i}+\left(1-{y}_{i}\right)log\left(1-{p}_{i}\right)\right\}$$\large L\left(y_i,F_m(x_i)\right)=-\{y_ilogp_i+(1-y_i)log(1-p_i)\}$

$L\left({y}_{i},{F}_{m}\left({x}_{i}\right)\right)=-\left\{{y}_{i}log{p}_{i}+\left(1-{y}_{i}\right)log\left(1-{p}_{i}\right)\right\}$$\large L\left(y_i,F_m(x_i)\right)=-\left\{y_ilogp_i+(1-y_i)log(1-p_i)\right\}$
（先不带入负号）

=>$-{y}_{i}log\left(1+{e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)+\left(1-{y}_{i}\right)\left\{log\left({e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)-log\left(1+{e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)\right\}$$\large -y_ilog(1+e^\left(-F_m(x_i)\right))+(1-y_i)\{log(e^\left(-F_m(x_i)\right))-log(1+e^\left(-F_m(x_i)\right))\}$
=>$-{y}_{i}log\left(1+{e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)+log\left({e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)-log\left(1+{e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)-{y}_{i}log\left({e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)+{y}_{i}log\left(1+{e}^{\left(-{F}_{m}\left({x}_{i}\right)\right)}\right)$$\large -y_ilog(1+e^\left(-F_m(x_i)\right))+log(e^\left(-F_m(x_i)\right))-log(1+e^\left(-F_m(x_i)\right))-y_ilog(e^\left(-F_m(x_i)\right))+y_ilog(1+e^\left(-F_m(x_i)\right))$
=>${y}_{i}{F}_{m}\left({x}_{i}\right)-log\left(1+{e}^{{F}_{m}\left({x}_{i}\right)}\right)$$\large y_iF_m(x_i)-log\left(1+e^{F_m(x_i)}\right)$

$L\left({y}_{i},{F}_{m}\left({x}_{i}\right)\right)=-\left\{{y}_{i}log{p}_{i}+\left(1-{y}_{i}\right)log\left(1-{p}_{i}\right)\right\}=-\left\{{y}_{i}{F}_{m}\left({x}_{i}\right)-log\left(1+{e}^{{F}_{m}\left({x}_{i}\right)}\right)\right\}$$\large L\left(y_i,F_m(x_i)\right)=-\left\{y_ilogp_i+(1-y_i)log(1-p_i)\right\}=-\left\{y_iF_m(x_i)-log\left(1+e^{F_m(x_i)}\right)\right\}$

## 实践

${x}_{i}$$x_i$ 1 2 3 4 5 6 7 8 9 10
${y}_{i}$$y_i$ 0 0 0 1 1 0 0 0 1 1

1. 以logloss为损失函数
2. 以MSE为分裂准则
3. 树的深度为1
4. 学习率为0.1

${F}_{0}\left(x\right)=log\left(\frac{\sum _{i=1}^{N}{y}_{i}}{\sum _{i=1}^{N}\left(1-{y}_{i}\right)}\right)=log\left(\frac{4}{6}\right)=-0.4054$$\large F_0(x)=log\left(\frac{\sum_{i=1}^N y_i}{\sum_{i=1}^N(1-y_i)}\right)=log\left(\frac{4}{6}\right)=-0.4054$

$\stackrel{~}{{y}_{i}}=-{\left[\frac{\mathrm{\partial }L\left({y}_{i},F\left({\mathbf{x}}_{i}\right)\right)}{\mathrm{\partial }F\left({\mathbf{x}}_{i}\right)}\right]}_{F\left(x\right)={F}_{m-1}\left(x\right)}={y}_{i}-\frac{1}{1+{e}^{\left(-{F}_{m-1}\left({x}_{i}\right)\right)}}={y}_{i}-\frac{1}{1+{e}^{\left(-{F}_{0}\left({x}_{i}\right)\right)}}$$\large \tilde{y_i}=-\left[\frac{\partial L(y_i,F(\mathbf{x}_i))}{\partial F(\mathbf{x}_i)}\right]_{F(x)=F_{m-1}(x)}=y_i-\frac{1}{1+e^\left(-F_{m-1}(x_i)\right)}=y_i-\frac{1}{1+e^\left(-F_{0}(x_i)\right)}$

$\stackrel{~}{{y}_{1}}=0-\frac{1}{1+{e}^{\left(0.4054\right)}}=-0.400$$\large \tilde{y_1}=0-\frac{1}{1+e^\left(0.4054\right)}=-0.400$

${x}_{i}$$x_i$ 1 2 3 4 5 6 7 8 9 10
${\stackrel{~}{y}}_{i}$$\tilde{y}_i$ -0.4 -0.4 -0.4 0.6 0.6 -0.4 -0.4 -0.4 0.6 0.6

${R}_{11}$$\large R_{11}$${x}_{i}<=8$$x_i<=8$${R}_{21}$$\large R_{21}$${x}_{i}>8$$x_i>8$

$\sum _{{x}_{i}\in {R}_{11}}{\stackrel{~}{y}}_{i}=\left({\stackrel{~}{y}}_{1}+{\stackrel{~}{y}}_{2}+{\stackrel{~}{y}}_{3}+{\stackrel{~}{y}}_{4}+{\stackrel{~}{y}}_{5}+{\stackrel{~}{y}}_{6}+{\stackrel{~}{y}}_{7}+{\stackrel{~}{y}}_{8}\right)=-1.2$$\large \sum_{x_i \in R_{11}}\tilde{y}_i=(\tilde{y}_1+\tilde{y}_2+\tilde{y}_3+\tilde{y}_4+\tilde{y}_5+\tilde{y}_6+\tilde{y}_7+\tilde{y}_8)=-1.2$
$\sum _{{x}_{i}\in {R}_{11}}\left({y}_{i}-{\stackrel{~}{y}}_{i}\right)\ast \left(1-{y}_{i}+{\stackrel{~}{y}}_{i}\right)=\left({y}_{1}-{\stackrel{~}{y}}_{1}\right)\ast \left(1-{y}_{1}+{\stackrel{~}{y}}_{1}\right)+\left({y}_{2}-{\stackrel{~}{y}}_{2}\right)\ast \left(1-{y}_{2}+{\stackrel{~}{y}}_{2}\right)+\left({y}_{3}-{\stackrel{~}{y}}_{3}\right)\ast \left(1-{y}_{3}+{\stackrel{~}{y}}_{3}\right)+\left({y}_{4}-{\stackrel{~}{y}}_{4}\right)\ast \left(1-{y}_{4}+{\stackrel{~}{y}}_{4}\right)+\left({y}_{5}-{\stackrel{~}{y}}_{5}\right)\ast \left(1-{y}_{5}+{\stackrel{~}{y}}_{5}\right)+\left({y}_{6}-{\stackrel{~}{y}}_{6}\right)\ast \left(1-{y}_{6}+{\stackrel{~}{y}}_{6}\right)+\left({y}_{7}-{\stackrel{~}{y}}_{7}\right)\ast \left(1-{y}_{7}+{\stackrel{~}{y}}_{7}\right)+\left({y}_{8}-{\stackrel{~}{y}}_{8}\right)\ast \left(1-{y}_{8}+{\stackrel{~}{y}}_{8}\right)=1.92$$\large \sum_{x_i \in R_{11}}(y_i-\tilde{y}_i)*(1-y_i+\tilde{y}_i)=(y_1-\tilde{y}_1)*(1-y_1+\tilde{y}_1)+(y_2-\tilde{y}_2)*(1-y_2+\tilde{y}_2)+(y_3-\tilde{y}_3)*(1-y_3+\tilde{y}_3)+(y_4-\tilde{y}_4)*(1-y_4+\tilde{y}_4)+(y_5-\tilde{y}_5)*(1-y_5+\tilde{y}_5)+(y_6-\tilde{y}_6)*(1-y_6+\tilde{y}_6)+(y_7-\tilde{y}_7)*(1-y_7+\tilde{y}_7)+(y_8-\tilde{y}_8)*(1-y_8+\tilde{y}_8)=1.92$

$\sum _{{x}_{i}\in {R}_{21}}{\stackrel{~}{y}}_{i}=\left({\stackrel{~}{y}}_{9}+{\stackrel{~}{y}}_{10}\right)=1.2$$\large \sum_{x_i \in R_{21}}\tilde{y}_i=(\tilde{y}_9+\tilde{y}_{10})=1.2$
$\sum _{{x}_{i}\in {R}_{21}}\left({y}_{i}-{\stackrel{~}{y}}_{i}\right)\ast \left(1-{y}_{i}+{\stackrel{~}{y}}_{i}\right)=\left({y}_{9}-{\stackrel{~}{y}}_{9}\right)\ast \left(1-{y}_{9}+{\stackrel{~}{y}}_{9}\right)+\left({y}_{10}-{\stackrel{~}{y}}_{10}\right)\ast \left(1-{y}_{10}+{\stackrel{~}{y}}_{10}\right)=0.48$$\large \sum_{x_i \in R_{21}}(y_i-\tilde{y}_i)*(1-y_i+\tilde{y}_i)=(y_9-\tilde{y}_9)*(1-y_9+\tilde{y}_9)+(y_{10}-\tilde{y}_{10})*(1-y_{10}+\tilde{y}_{10})=0.48$

${\gamma }_{11}=\frac{-1.2}{1.92}=-0.625$$\large \gamma_{11}=\frac{-1.2}{1.92}=-0.625$${\gamma }_{21}=\frac{1.2}{0.480}=2.5$$\large \gamma_{21}=\frac{1.2}{0.480}=2.5$

${F}_{m}\left(x\right)={F}_{m-1}\left(x\right)+\eta \ast \sum _{j=1}^{J}{\gamma }_{jm}I\left(x\in {R}_{jm}\right)$$\large F_m(x)=F_{m-1}(x)+\eta*\sum_{j=1}^J\gamma_{jm}I(x \in R_{jm})$

${F}_{1}\left({x}_{1}\right)={F}_{0}\left({x}_{1}\right)+0.1\ast \left(-0.625\right)=-0.4054-0.0625=-0.4679$$\large F_1(x_1)=F_0(x_1)+0.1*(-0.625)=-0.4054-0.0625=-0.4679$

${x}_{i}$$x_i$ 1 2 3 4 5 6 7 8 9 10
${F}_{1}\left({x}_{i}\right)$$F_1({x_i})$ -0.46796511 -0.46796511 -0.46796511 -0.46796511 -0.46796511 -0.46796511 -0.46796511 -0.46796511 -0.15546511 -0.15546511

=>${\stackrel{~}{y}}_{1}={y}_{1}-\frac{1}{1+{e}^{\left(-{F}_{1}\left({x}_{1}\right)\right)}}=0-0.38509=-0.38509$$\large \tilde{y}_1=y_1-\frac{1}{1+e^\left(-F_{1}(x_1)\right)}=0-0.38509=-0.38509$

${x}_{i}$$x_i$ 1 2 3 4 5 6 7 8 9 10
${\stackrel{~}{y}}_{i}$$\tilde{y}_i$ -0.38509799 -0.38509799 -0.38509799 0.61490201 0.61490201 -0.38509799 -0.38509799 -0.38509799 0.53878818 0.53878818

## 关于预测

${x}_{i}$$x_i$ 1 2 3 4 5 6 7 8 9 10
${F}_{2}\left({x}_{i}\right)$$F_2({x_i})$ -0.52501722 -0.52501722 -0.52501722 -0.52501722 -0.52501722 -0.52501722 -0.52501722 -0.52501722 0.06135501 0.06135501

${F}_{2}\left(x\right)$$F_2(x)$可有有下表：

${x}_{i}$$x_i$ 1 2 3 4 5 6 7 8 9 10
${p}_{i}$$p_i$ 0.37167979 0.37167979 0.37167979 0.37167979 0.37167979 0.37167979 0.37167979 0.37167979 0.51533394 0.51533394

(表中的概率为正样本的概率，即${y}_{i}=1$$y_i=1$的概率）

## Sklearn源码简单分析

class BinomialDeviance(ClassificationLossFunction):
"""Binomial deviance loss function for binary classification.

Binary classification is a special case; here, we only need to
fit one tree instead of n_classes trees.
"""
def __init__(self, n_classes):
if n_classes != 2:
raise ValueError("{0:s} requires 2 classes.".format(
self.__class__.__name__))
# we only need to fit one tree for binary clf.
super(BinomialDeviance, self).__init__(1)

def init_estimator(self):
return LogOddsEstimator()

def __call__(self, y, pred, sample_weight=None):
"""Compute the deviance (= 2 * negative log-likelihood). """
# logaddexp(0, v) == log(1.0 + exp(v))
pred = pred.ravel()
if sample_weight is None:
return -2.0 * np.mean((y * pred) - np.logaddexp(0.0, pred))
else:
return (-2.0 / sample_weight.sum() *
np.sum(sample_weight * ((y * pred) - np.logaddexp(0.0, pred))))

"""Compute the residual (= negative gradient). """
return y - expit(pred.ravel())

def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
residual, pred, sample_weight):
"""Make a single Newton-Raphson step.

our node estimate is given by:

sum(w * (y - prob)) / sum(w * prob * (1 - prob))

we take advantage that: y - prob = residual
"""
terminal_region = np.where(terminal_regions == leaf)[0]
residual = residual.take(terminal_region, axis=0)
y = y.take(terminal_region, axis=0)
sample_weight = sample_weight.take(terminal_region, axis=0)

numerator = np.sum(sample_weight * residual)
denominator = np.sum(sample_weight * (y - residual) * (1 - y + residual))
# prevents overflow and division by zero
if abs(denominator) < 1e-150:
tree.value[leaf, 0, 0] = 0.0
else:
tree.value[leaf, 0, 0] = numerator / denominator

def _score_to_proba(self, score):
proba = np.ones((score.shape[0], 2), dtype=np.float64)
proba[:, 1] = expit(score.ravel())
proba[:, 0] -= proba[:, 1]
return proba

def _score_to_decision(self, score):
proba = self._score_to_proba(score)
return np.argmax(proba, axis=1)

    def negative_gradient(self, y, pred, **kargs):
"""Compute the residual (= negative gradient). """
return y - expit(pred.ravel())

    def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
residual, pred, sample_weight):
"""Make a single Newton-Raphson step.

our node estimate is given by:

sum(w * (y - prob)) / sum(w * prob * (1 - prob))

we take advantage that: y - prob = residual
"""
terminal_region = np.where(terminal_regions == leaf)[0]
residual = residual.take(terminal_region, axis=0)
y = y.take(terminal_region, axis=0)
sample_weight = sample_weight.take(terminal_region, axis=0)

numerator = np.sum(sample_weight * residual)
denominator = np.sum(sample_weight * (y - residual) * (1 - y + residual))
# prevents overflow and division by zero
if abs(denominator) < 1e-150:
tree.value[leaf, 0, 0] = 0.0
else:
tree.value[leaf, 0, 0] = numerator / denominator

class LogOddsEstimator(object):
"""An estimator predicting the log odds ratio."""
scale = 1.0

def fit(self, X, y, sample_weight=None):
# pre-cond: pos, neg are encoded as 1, 0
if sample_weight is None:
pos = np.sum(y)
neg = y.shape[0] - pos
else:
pos = np.sum(sample_weight * y)
neg = np.sum(sample_weight * (1 - y))

if neg == 0 or pos == 0:
raise ValueError('y contains non binary labels.')
self.prior = self.scale * np.log(pos / neg)

def predict(self, X):
check_is_fitted(self, 'prior')

y = np.empty((X.shape[0], 1), dtype=np.float64)
y.fill(self.prior)
return y

    def fit(self, X, y, sample_weight=None):
# pre-cond: pos, neg are encoded as 1, 0
if sample_weight is None:
pos = np.sum(y)
neg = y.shape[0] - pos
else:
pos = np.sum(sample_weight * y)
neg = np.sum(sample_weight * (1 - y))

if neg == 0 or pos == 0:
raise ValueError('y contains non binary labels.')
self.prior = self.scale * np.log(pos / neg)

    def _score_to_proba(self, score):
proba = np.ones((score.shape[0], 2), dtype=np.float64)
proba[:, 1] = expit(score.ravel())
proba[:, 0] -= proba[:, 1]
return proba


## 参考资料

http://docplayer.net/21448572-Generalized-boosted-models-a-guide-to-the-gbm-package.html（各种loss function的推导结果）