手动添加先验参数
optuna提供自动参数搜索,但有时候有一些特定的超参数集要先尝试, 比如初始学习率和叶子数量. 另外, 也有可能在让 Optuna 找到更好的超参数集之前,你已经尝试过一些集合.
Optuna 提供 两个API 以应对这种场景:
将这些超参数集合传递过去并让 Optuna 对其求值 enqueue_trial()
将这些集合的结果标记为已完成的 Trials add_trial()
第一个场景: 让 Optuna 对你的超参数求值
有一些备选值使用
Optuna 有一个 API optuna.study.Study.enqueue_trial()
, 它允许你将这些超参数传入 Optuna, Optuna 会对他们求值.
def objective(trial):
data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
dtrain = lgb.Dataset(train_x, label=train_y)
dvalid = lgb.Dataset(valid_x, label=valid_y)
param = {
"objective": "binary",
"metric": "auc",
"verbosity": -1,
"boosting_type": "gbdt",
"bagging_fraction": min(trial.suggest_float("bagging_fraction", 0.4, 1.0 + 1e-12), 1),
"bagging_freq": trial.suggest_int("bagging_freq", 0, 7),
"min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
}
# Add a callback for pruning.
pruning_callback = optuna.integration.LightGBMPruningCallback(trial, "auc")
gbm = lgb.train(
param, dtrain, valid_sets=[dvalid], verbose_eval=False, callbacks=[pruning_callback]
)
preds = gbm.predict(valid_x)
pred_labels = np.rint(preds)
accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)
return accuracy
study = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner())
添加备选值
study.enqueue_trial(
{
"bagging_fraction": 1.0,
"bagging_freq": 0,
"min_child_samples": 20,
}
)
study.enqueue_trial(
{
"bagging_fraction": 0.75,
"bagging_freq": 5,
"min_child_samples": 20,
}
)
import logging
import sys
# Add stream handler of stdout to show the messages to see Optuna works expectedly.
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
study.optimize(objective, n_trials=100, timeout=600)
第二个场景: 让 Optuna 利用已经求值过的超参数
已经试验过的效果不好的参数,Optuna 有一个 API optuna.study.Study.add_trial()
, 它让你向Optuna 注册这些结果, 之后 Optuna 会在进行超参数采样的时候将它们考虑进去.
study = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner())
study.add_trial(
optuna.trial.create_trial(
params={
"bagging_fraction": 1.0,
"bagging_freq": 0,
"min_child_samples": 20,
},
distributions={
"bagging_fraction": optuna.distributions.UniformDistribution(0.4, 1.0 + 1e-12),
"bagging_freq": optuna.distributions.IntUniformDistribution(0, 7),
"min_child_samples": optuna.distributions.IntUniformDistribution(5, 100),
},
value=0.94,
)
)
study.add_trial(
optuna.trial.create_trial(
params={
"bagging_fraction": 0.75,
"bagging_freq": 5,
"min_child_samples": 20,
},
distributions={
"bagging_fraction": optuna.distributions.UniformDistribution(0.4, 1.0 + 1e-12),
"bagging_freq": optuna.distributions.IntUniformDistribution(0, 7),
"min_child_samples": optuna.distributions.IntUniformDistribution(5, 100),
},
value=0.95,
)
)
study.optimize(objective, n_trials=100, timeout=600)
最佳参数重新使用
你已经用 Optuna 发现了好的超参数, 并且想用已经发现的最佳超参数来运行一个类似的 objective 函数以进一步分析结果, 或者为节省时间, 你已经用 Optuna 来优化了一个部分数据集. 在超参数调整以后, 你想在整个数据集上用你找到的最佳超参数来训练模型.
from sklearn import metrics
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import optuna
def objective(trial):
X, y = make_classification(n_features=10, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
C = trial.suggest_loguniform("C", 1e-7, 10.0)
clf = LogisticRegression(C=C)
clf.fit(X_train, y_train)
return clf.score(X_test, y_test)
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=10)
print(study.best_trial.value) # Show the best value.
假设在超参数优化之后, 你想计算出同一个数据集上的其他的测度, 比如召回率, 精确度和 f1-score. 你可以定义另一个目标函数, 令其和 objective 高度相似, 以用最佳超参数来重现模型.
def detailed_objective(trial):
# Use same code objective to reproduce the best model
X, y = make_classification(n_features=10, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
C = trial.suggest_loguniform("C", 1e-7, 10.0)
clf = LogisticRegression(C=C)
clf.fit(X_train, y_train)
# calculate more evaluation metrics
pred = clf.predict(X_test)
acc = metrics.accuracy_score(pred, y_test)
recall = metrics.recall_score(pred, y_test)
precision = metrics.precision_score(pred, y_test)
f1 = metrics.f1_score(pred, y_test)
return acc, f1, recall, precision
# 将最佳参数best_trial传给detailed_object做参数
detailed_objective(study.best_trial) # calculate acc, f1, recall, and precision