超参数优化

最新推荐文章于 2024-03-27 15:52:49 发布

WilenWu

最新推荐文章于 2024-03-27 15:52:49 发布

阅读量2k

点赞数 20

文章标签：机器学习

本文链接：https://blog.csdn.net/qq_41518277/article/details/136169521

版权

超参数优化

超参数是用于控制学习过程的不同参数值，对机器学习模型的性能有显著影响。例如，随机森林算法中的估计器数量、最大深度和分裂标准等。超参数优化是找到超参数值的正确组合，以便在合理的时间内实现数据最大性能的过程。这个过程在机器学习算法的预测准确性中起着至关重要的作用。因此，超参数优化被认为是构建机器学习模型中最棘手的部分。

目前来说sklearn支持两种类型的超参数优化：

GridSearchCV 网格搜索是一种广泛使用的传统方法，详尽地考虑了所有参数组合
RandomizedSearchCV 随机搜索可以从具有指定分布的参数空间中抽样给定数量的候选者

贝叶斯优化方法 (Bayesian Optimization)是当前超参数优化领域的SOTA手段（State of the Art），可以被认为是当前最为先进的优化框架。

贝叶斯优化的工作原理是：首先对目标函数的全局行为建立先验知识（通常用高斯过程来表示），然后通过观察目标函数在不同输入点的输出，更新这个先验知识，形成后验分布。基于后验分布，选择下一个采样点，这个选择既要考虑到之前观察到的最优值（即利用），又要考虑到全局尚未探索的区域（即探索）。这个选择的策略通常由所谓的采集函数（Acquisition Function）来定义，比如最常用的期望提升（Expected Improvement），这样，贝叶斯优化不仅可以有效地搜索超参数空间，还能根据已有的知识来引导搜索，避免了大量的无用尝试。

具体的算法细节可以参考：https://zhuanlan.zhihu.com/p/643095927?utm_id=0

本文介绍一些实用的超参数优化技术：

Hyperopt
Scikit Optimize
Optuna

# read the dataset
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Hyperopt

Hyperopt优化器是目前最通用的贝叶斯优化器之一，它集成了包括随机搜索、模拟退火和TPE（Tree-structured Parzen Estimator Approach）等多种优化算法。

官方文档：https://hyperopt.github.io/hyperopt/

安装库

pip install hyperopt

Hyperopt 优化过程主要分为4步：

Step 1 定义参数空间
Step 2 定义目标函数
Step 3 执行优化
Step 4 评估输出

Step 1 定义参数空间

我们使用dict()来定义超参数空间，其中key可以任意设置，value则需用hyperopt的hp函数：

hyperopt.hp	说明
hp.choice(label, options)	用于分类参数，返回options 中的元素
hp.pchoice(label, p_list)	返回 (probability, option) 元素对
hp.randint(label, low, high)	返回区间 [low, upper) 内的随机整数
hp.uniform(label, low, high)	均匀返回 low, high 之间的浮点数
hp.quniform(label, low, high, q)	均匀返回 low, high 之间的浮点数，适用于离散值
hp.uniformint(label, low, high)	均匀返回 low, high 之间均的整数，适用于离散值
hp.loguniform(label, low, high)	对数均匀返回 e^low,e^high 之间浮点数
hp.qloguniform(label, low, high, q)	对数均匀返回 e^low, e^high 之间浮点数，适用于离散值
hp.normal(label, mu, sigma)	正态分布返回实数
hp.qnormal(label, mu, sigma, q)	正态分布返回实数，适用于离散值
hp.lognormal(label, mu, sigma)	对数正态分布返回实数
hp.qlognormal(label, mu, sigma, q)	正态分布返回实数，适用于离散值

每个hp函数都有一个label作为第一个参数，这些label用于在优化过程中将参数传递给调用方。

# define a search space
from hyperopt import hp
space = {
    'random_state': 42, 
    'max_depth': hp.uniformint('max_depth', 2, 10),
    'learning_rate': hp.loguniform('learning_rate', 0.001, 1.0),
    'n_estimators': hp.choice('n_estimators', [100, 200, 300, 400]),
    'subsample': hp.quniform('subsample', 0.1, 1.0, 0.1),
    'max_features': hp.choice('max_features', ['sqrt', 'log2'])
}

Step 2 定义目标函数

Hyperopt 目前只支持目标函数的最小化

# define an objective function
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score

def objective(params):    
    reg = GradientBoostingRegressor(**params)
    mse = cross_val_score(reg, X_train, y_train, scoring='neg_mean_squared_error', cv=5).mean()
    return -1*mse

Step 3 执行优化

hyperopt 使用 fmin 函数进行优化。

fmin接收两种搜索算法：

tpe.suggest 指代TPE (Tree Parzen Estimators) 方法
rand.suggest 指代随机网格搜索方法

# minimize the objective over the space
from hyperopt import fmin, tpe, space_eval, Trials 
trials = Trials() # Initialize trials object

best = fmin(
    fn=objective, # Objective Function to optimize
    space=space, # Hyperparameter's Search Space
    algo=tpe.suggest, # Optimization algorithm
    max_evals=100, # Number of optimization attempts
    trials = trials,
  	verbose=True,
  	early_stop_fn=no_progress_loss(5)
)

Output:

100%|██████| 1000/1000 [02:35<00:00,  6.44trial/s, best loss: 8.932729710763638]

其中 Trials 对象用于保存所有的超参数、损失和其他信息。

Step 4 评估输出

print(space_eval(space, best))

Output:

{'learning_rate': 0.2, 'max_depth': 5, 'max_features': 'sqrt', 'n_estimators': 54, 'subsample': 0.9}

分布式优化

超参数调优通常涉及训练数百或数千个模型，Hyperopt 允许分布式调优。通过 trials 参数将 SparkTrials 传递给 fmin 函数，在Spark集群上并行运行这些任务。

# We can run Hyperopt locally (only on the driver machine)
# by calling `fmin` without an explicit `trials` argument.
best_hyperparameters = fmin(
  fn=train,
  space=search_space,
  algo=algo,
  max_evals=32)

# We can distribute tuning across our Spark cluster
# by calling `fmin` with a `SparkTrials` instance.
from hyperopt import SparkTrials
spark_trials = SparkTrials()
best_hyperparameters = fmin(
  fn=train,
  space=search_space,
  algo=algo,
  trials=spark_trials,
  max_evals=32)

SparkTrials可以通过3个参数进行配置，所有这些参数都是可选的：

parallelism 最大并行数，默认为 SparkContext.defaultParallelism。
timeout 允许的最大时间（以秒为单位），默认为None。
spark_session 如果没有给出，SparkTrials将寻找现有的SparkSession。

Scikit-optimize

Scikit-optimize 建立在 Scipy、Numpy 和 Scikit-Learn之上。非常易于使用，它提供了用于贝叶斯优化的通用工具包，可用于超参数调优。

官方文档：https://scikit-optimize.github.io/stable/

安装库

pip install scikit-optimize

Scikit-optimize 优化过程主要分为4步：

Step 1 定义参数空间
Step 2 定义目标函数
Step 3 执行优化
Step 4 评估输出

Step 1 定义参数空间

使用 Scikit-optimize 提供的方法定义参数空间：

skopt.space	comment
space.Real(low, high, prior, name)	用于浮点数参数
space.Integer(low, high, prior, name)	用于整数参数
space.Categorical(categories, prior, name)	用于分类参数

通过可选的prior参数可以对整型或浮点型取对数操作，或给类别型先验概率

# define a search space
import skopt 
search_space= [
    skopt.space.Integer(2, 10, name='max_depth'),
    skopt.space.Real(0.001, 1.0, prior='log-uniform', name='learning_rate'),
    skopt.space.Integer(10, 100, name='n_estimators'),
    skopt.space.Real(0.2, 0.9, name='subsample'),
    skopt.space.Categorical(['sqrt', 'log2'], name='max_features')
]

Step 2 定义目标函数

Scikit-optimize 支持目标函数最小化。

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score
from skopt.utils import use_named_args

# define the function used to evaluate a given configuration
@use_named_args(space)
def objective(**params):
    # configure the model with specific hyperparameters
    clf = GradientBoostingRegressor( random_state=42, **params)
    mse = cross_val_score(reg, X_train, y_train, scoring='neg_mean_squared_error', cv=5).mean()
    return -1*mse

一般使用交叉验证来避免过拟合。used_named_args装饰器允许目标函数将参数作为关键字参数接收。

Step 3 执行优化

有四种优化算法可供选择：

skopt.optimizer	说明
dummy_minimize	随机搜索
forest_minimize	使用决策树的贝叶斯优化
gbrt_minimize	使用GBRT的贝叶斯优化
gp_minimize	使用高斯过程的贝叶斯优化

from skopt import gp_minimize

# perform optimization
result = gp_minimize(
    func=objective,
    dimensions=search_space,
    n_calls=100,
    random_state=42,
    verbose=True
)

Step 4 评估输出

打印最佳得分和最佳参数

# summarizing finding:
print(f'Best score: {result.fun}') 
print(f'Best parameters: {result.x}')

打印优化过程中的目标函数值

print(result.func_vals)

绘制收敛轨迹

# plot convergence traces
from skopt.plots import plot_convergence
plot_convergence(result)

Scikit-Learn API

Scikit-optimize 提供了一个类似于 GridSearchCV 和 RandomizedSearchCV 的接口 BayesSearchCV，实现了 fit 和 score 方法，以及 predict, predict_proba, decision_function, transform and inverse_transform 等常用方法。

from skopt.searchcv import BayesSearchCV
from sklearn.ensemble import GradientBoostingRegressor

# define search space 
params = {
    "max_depth": (2, 10), # integer valued parameter
    "learning_rate": (0.001, 1.0, 'log-uniform')
    "n_estimators": [100, 200, 300, 400],
    "subsample": (0.2, 0.9, 'uniform')
    "max_features": ["sqrt", "log2"],  # categorical parameter
}
    
# define the search
optimizer = BayesSearchCV(
    estimator=GradientBoostingRegressor(),
    search_spaces=params,
    cv=5,
    n_iter=100,
    scoring="accuracy",
    verbose=4,
    random_state=42
)

# executes bayesian optimization
optimizer.fit(X_train, y_train)

# report the best result
print(optimizer.best_score_)
print(optimizer.best_params_)

Optuna

Optuna是目前为止最成熟、拓展性最强的超参数优化框架，它是专门为机器学习和深度学习所设计。为了满足机器学习开发者的需求，Optuna拥有强大且固定的API，因此Optuna代码简单，编写高度模块化，

Optuna可以无缝衔接到PyTorch、Tensorflow等深度学习框架上，也可以与sklearn的优化库scikit-optimize结合使用，因此Optuna可以被用于各种各样的优化场景。

官方文档：https://optuna.org/

安装库

pip install optuna

Optuna 优化过程主要分为3步：

Step 1 构建目标函数及参数空间
Step 2 执行优化
Step 3 评估输出

Step 1 构建目标函数及参数空间

Optuna 基于 Trial 和 Study 两个组件实现优化（optimization）。在优化过程中，Optuna 反复调用目标函数，在不同的参数下对其进行求值。一个 Trial 对应着目标函数的单次执行。在每次调用目标函数的时候，它都被内部实例化一次。而 suggest API (例如 suggest_uniform()) 在目标函数内部调用，被用于获取单个 trial 的参数。

Optuna 允许在目标函数中定义参数空间和目标，优化器会通过trail所携带的方法来构造参数空间。

optuna.trial.Trial	说明
trial.suggest_categorical(name, choices)	适用于分类参数
trial.suggest_int(name, low, high, step=1, log=False)	适用于整数参数
trial.suggest_float(name, low, high, *, step=None, log=False)	适用于浮点参数
trial.suggest_uniform(name, low, high)	均匀分布
trial.suggest_loguniform(name, low, high)	对数均匀分布
trial.suggest_discrete_uniform(name, low, high, q)	离散均匀分布

通过可选的 step 与 log 参数，我们可以对整形或者浮点型参数进行离散化或者取对数操作。

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import cross_val_score

# define the search space and the objecive function
def objective(trial):
    # Define the search space
	params= {
	   'max_depth': trial.suggest_int('max_depth',  2, 10, 1),
 	   'learning_rate': trial.suggest_float('learning_rate', 0.001, 1.0, log=True),
 	   'n_estimators': trial.suggest_int('n_estimators', 100, 400, 100),
 	   'subsample': trial.suggest_float('subsample', 0.1, 1.0, 0.1),
 	   'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2'])
	}
    reg = GradientBoostingRegressor(**params)
    mse = cross_val_score(reg, X_train, y_train, scoring='neg_mean_squared_error', cv=5).mean()
    return mse

通常使用交叉验证来避免过拟合。

Step 2 执行优化

下面是几个常用术语：

Trial: 目标函数的单次调用
Study: 一次优化过程，包含一系列的 trials.
Parameter: 待优化的参数

在 Optuna 中，我们用 study 对象来管理优化过程。 create_study() 方法会返回一个 study 对象，可以填写minimize或maximize确定优化方向，然后通过 .optimize 方法执行优化过程。

可以通过模块optuna.sampler来定义优化算法：

GridSampler：网格搜索
RandomSampler：随机抽样
TPESampler：使用TPE (Tree-structured Parzen Estimator) 算法
CmaEsSampler：使用 CMA-ES算法

from optuna.samplers import TPESampler

# create a study object 
study = optuna.create_study(direction="maximize", sampler=TPESampler())

# # Invoke optimization of the objective function.
study.optimize(objective, n_trials=100)

获得 trial 的数目：

len(study.trials)

Out: 100

再次执行 optimize()，可以继续优化过程

study.optimize(objective, n_trials=100)

获得更新后的的 trial 数量：

len(study.trials)

Out: 200

Step 3 评估输出

打印最佳最佳分数和超参数值

print(f'Best score: {study.best_value}') 
print('Best parameters: ', *[f'- {k} = {v}' for k,v in study.best_params], sep='\n')

Optuna 中提供了不同的方法来可视化优化结果：

函数	说明
plot_contour(study)	将参数关系绘制成等值线
plot_intermidiate_values(study)	绘制所有trial的学习曲线
plot_optimization_history(study)	绘制所有trial的优化历史记录
plot_param_importances(study)	绘制超参数重要性及其值
plot_edf(study)	绘制study目标值的edf

optuna.visualization.plot_optimization_history(study)

参数空间高级用法

在 Optuna 中，我们使用和 Python 语法类似的方式来定义搜索空间，其中包含条件和循环语句。

分支：

import sklearn.ensemble
import sklearn.svm


def objective(trial):
    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
    if classifier_name == "SVC":
        svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
        classifier_obj = sklearn.svm.SVC(C=svc_c)
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
        classifier_obj = sklearn.ensemble.RandomForestClassifier(max_depth=rf_max_depth)

循环：

import torch
import torch.nn as nn


def create_model(trial, in_size):
    n_layers = trial.suggest_int("n_layers", 1, 3)

    layers = []
    for i in range(n_layers):
        n_units = trial.suggest_int("n_units_l{}".format(i), 4, 128, log=True)
        layers.append(nn.Linear(in_size, n_units))
        layers.append(nn.ReLU())
        in_size = n_units
    layers.append(nn.Linear(in_size, 10))

    return nn.Sequential(*layers)

多目标优化

from sklearn.metrics import make_scorer, root_mean_squared_error
def objective(trial):
    # Define the search space
	params= {
	   'max_depth': trial.suggest_int('max_depth',  2, 10, 1),
 	   'learning_rate': trial.suggest_float('learning_rate', 0.001, 1.0, log=True),
 	   'n_estimators': trial.suggest_int('n_estimators', 100, 400, 100),
 	   'subsample': trial.suggest_float('subsample', 0.1, 1.0, 0.1),
 	   'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2'])
	}
    reg = GradientBoostingRegressor(**params)
    mse = cross_val_score(reg, X_train, y_train, scoring=make_scorer(root_mean_squared_error), cv=5).mean()
    r2 = cross_val_score(reg, X_train, y_train, scoring='r2', cv=5).mean()
    return mse, r2

study = optuna.create_study(directions=["minimize", "maximize"])
study.optimize(objective, n_trials=100, timeout=300)
print("Number of finished trials: ", len(study.trials))