贝叶斯优化
python中有可调用库bayesian-optimization。
贝叶斯调参是源自贝叶斯学派的后验分布,原理是把要优化的函数看作是一个黑箱,通过输入新的样本点来更新目标函数的后验分布。
贝叶斯优化包含未知函数
f
f
f,数据集
D
D
D,超参数搜索空间
X
X
X,模型
D
D
D,和搜索函数 Acquisition Function,
S
S
S,超参数
x
x
x。每次调节超参数都会生产一个输出。一般会定义循环次数。 Acquisition Function是用来寻找下一个超参数
x
x
x的函数。
先定义要优化的函数:
def rf_cv_lgb(num_leaves, max_depth, bagging_fraction, feature_fraction, bagging_freq, min_data_in_leaf,
min_child_weight, min_split_gain, reg_lambda, reg_alpha):
# 建立模型
model_lgb = lgb.LGBMClassifier(boosting_type='gbdt', objective='multiclass', num_class=4,
learning_rate=0.1, n_estimators=5000,
num_leaves=int(num_leaves), max_depth=int(max_depth),
bagging_fraction=round(bagging_fraction, 2), feature_fraction=round(feature_fraction, 2),
bagging_freq=int(bagging_freq), min_data_in_leaf=int(min_data_in_leaf),
min_child_weight=min_child_weight, min_split_gain=min_split_gain,
reg_lambda=reg_lambda, reg_alpha=reg_alpha,
n_jobs= 8
)
f1 = make_scorer(f1_score, average='micro')
val = cross_val_score(model_lgb, X_train_split, y_train_split, cv=5, scoring=f1).mean()
return val
再定义优化器。
bayes_lgb = BayesianOptimization(
rf_cv_lgb,
{
'num_leaves':(10, 200),
'max_depth':(3, 20),
'bagging_fraction':(0.5, 1.0),
'feature_fraction':(0.5, 1.0),
'bagging_freq':(0, 100),
'min_data_in_leaf':(10,100),
'min_child_weight':(0, 10),
'min_split_gain':(0.0, 1.0),
'reg_alpha':(0.0, 10),
'reg_lambda':(0.0, 10),
}
)
```第一个参数rf_cv_lgb是要优化的函数,第二个参数是我们所需要输入的超参数名称,以及其范围。超参数名称必须和目标函数的输入名称一一对应。
代码来源:https://github.com/datawhalechina/team-learning-data-mining/blob/master/HeartbeatClassification/Task4%20%E6%A8%A1%E5%9E%8B%E8%B0%83%E5%8F%82.md