依然是接着上一节的,我们关注每颗树是如何建立的。
n_stages = self._fit_stages(X, y, y_pred, sample_weight, random_state,
begin_at_stage, monitor, X_idx_sorted)
def _fit_stages(self, X, y, y_pred, sample_weight, random_state,
begin_at_stage=0, monitor=None, X_idx_sorted=None):
在计算residual的时候,GBDT提供了几种不同的loss function:
LOSS_FUNCTIONS = {'ls': LeastSquaresError,
'lad': LeastAbsoluteError,
'huber': HuberLossFunction,
'quantile': QuantileLossFunction,
'deviance': None, # for both, multinomial and binomial
'exponential': ExponentialLoss,
}
那么首先问题就来了,GBDT中第一次计算residual的时候需要y-y_pred,那么这个y_pred是怎么来的呢?
这个loss function就是默认的deviance,
if self.loss == 'deviance':
loss_class = (MultinomialDeviance
if len(self.classes_) > 2
else BinomialDeviance)
else:
loss_class = LOSS_FUNCTIONS[self.loss]