lightGBM中,Bagging的工作机制是什么?
官方文档中对这2个参数的介绍:
bagging_fraction
?︎, default =1.0
, type = double, aliases:sub_row
,subsample
,bagging
, constraints:0.0<bagging_fraction<=1.0
- like
feature_fraction
, but this will randomly select part of data without resampling- can be used to speed up training
- can be used to deal with over-fitting
- Note: to enable bagging,
bagging_freq
should be set to a non zero value as well
bagging_freq
?︎, default =0
, type = int, aliases:subsample_freq
- frequency for bagging
0
means disable bagging;k
means perform bagging at everyk
iteration- Note: to enable bagging,
bagging_fraction
should be set to value smaller than1.0
as well
在Stack Overflow中,有这样的答案:
The code executes what documentation says- it samples a subset of training examples of the size
bagging_fraction * N_train_examples
. And training of the i-th tree is performed on this subset. This sampling can be done for each tree (i.e. each iteration) or after eachbagging_freq
trees have been trained.For example,
bagging_fraction=0.5, bagging_freq=10
means that sampling of new0.5*N_train_examples
entries will happen every 10 iterations
大白话:
每迭代 bagging_freq
轮,进行一次bagging_fraction