分布式执行引擎ray入门--（4）Ray Tune

最新推荐文章于 2024-05-23 23:12:38 发布

薇酱

最新推荐文章于 2024-05-23 23:12:38 发布

阅读量1.1k

点赞数 16

分类专栏：学习框架跟着chatgpt一起学文章标签：分布式学习人工智能

本文链接：https://blog.csdn.net/qq_17246605/article/details/136433835

版权

跟着chatgpt一起学同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

学习框架

6 篇文章 0 订阅

订阅专栏

本文详细介绍了如何使用Ray的Tune库进行Keras、PyTorch模型的超参数调优，包括基础入门步骤、搜索空间配置、常用的搜索算法和调度器，以及结果分析方法。

摘要由CSDN通过智能技术生成

1.基础入门

1.1 基础版

1.2 Keras+Hyperopt 调参

搜索算法（Tune Search Algorithms）

调度器（Tune Schedulers）

结果Tune ResultGrid

1.基础入门

1.1 基础版

安装相关依赖：pip install "ray[tune]"

from ray import train, tune


def objective(config):  # ①
    score = config["a"] ** 2 + config["b"]
    return {"score": score}


search_space = {  # ②
    "a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
    "b": tune.choice([1, 2, 3]),
}

tuner = tune.Tuner(objective, param_space=search_space)  # ③

results = tuner.fit()
print(results.get_best_result(metric="score", mode="min").config)

使用ray调参一共只需要3步：

① 定义目标函数

② 定义搜索空间

③ 启动一个Tune运行并打印出最佳结果

1.2 Keras+Hyperopt 调参

from ray import tune
from ray.tune.search.hyperopt import HyperOptSearch
import keras


def objective(config):  # ①
    model = keras.models.Sequential()
    model.add(keras.layers.Dense(784, activation=config["activation"]))
    model.add(keras.layers.Dense(10, activation="softmax"))

    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
    # model.fit(...)
    # loss, accuracy = model.evaluate(...)
    return {"accuracy": accuracy}


search_space = {"activation": tune.choice(["relu", "tanh"])}  # ②
algo = HyperOptSearch()

tuner = tune.Tuner(  # ③
    objective,
    tune_config=tune.TuneConfig(
        metric="accuracy",
        mode="max",
        search_alg=algo,
    ),
    param_space=search_space,
)
results = tuner.fit()

和基础版本类似，也是分三步，只是第一步要将Keras模型包装在一个目标函数中。

这里使用到了HyperOptSearch搜索算法，也可以自行尝试其他搜索算法

1.3 PyTorch+Optuna

import torch
from ray import train, tune
from ray.tune.search.optuna import OptunaSearch


def objective(config):  # ①
    train_loader, test_loader = load_data()  # Load some data
    model = ConvNet().to("cpu")  # Create a PyTorch conv net
    optimizer = torch.optim.SGD(  # Tune the optimizer
        model.parameters(), lr=config["lr"], momentum=config["momentum"]
    )

    while True:
        train(model, optimizer, train_loader)  # Train the model
        acc = test(model, test_loader)  # Compute test accuracy
        train.report({"mean_accuracy": acc})  # Report to Tune


search_space = {"lr": tune.loguniform(1e-4, 1e-2), "momentum": tune.uniform(0.1, 0.9)}
algo = OptunaSearch()  # ②

tuner = tune.Tuner(  # ③
    objective,
    tune_config=tune.TuneConfig(
        metric="mean_accuracy",
        mode="max",
        search_alg=algo,
    ),
    run_config=train.RunConfig(
        stop={"training_iteration": 5},
    ),
    param_space=search_space,
)
results = tuner.fit()
print("Best config is:", results.get_best_result().config)

和1.2中的类似，也是第一步要将PyTorch模型包装在一个目标函数中。

这里使用的是OptunaSearch搜索算法，同时设定了5次迭代就停止的条件，当然也可以设置其他

停止条件。

1.4. Tune的优势

前沿的优化算法
提升效率
- 仅需要加一点代码就可以适应ray tune
- 支持多种存储选项以存储实验结果（如NFS、云存储）
支持多GPU、分布式训练
友好支持其他调参工具

2.核心概念

搜索空间

config = {
    "uniform": tune.uniform(-5, -1),  # Uniform float between -5 and -1
    "quniform": tune.quniform(3.2, 5.4, 0.2),  # Round to multiples of 0.2
    "loguniform": tune.loguniform(1e-4, 1e-1),  # Uniform float in log space
    "qloguniform": tune.qloguniform(1e-4, 1e-1, 5e-5),  # Round to multiples of 0.00005
    "randn": tune.randn(10, 2),  # Normal distribution with mean 10 and sd 2
    "qrandn": tune.qrandn(10, 2, 0.2),  # Round to multiples of 0.2
    "randint": tune.randint(-9, 15),  # Random integer between -9 and 15
    "qrandint": tune.qrandint(-21, 12, 3),  # Round to multiples of 3 (includes 12)
    "lograndint": tune.lograndint(1, 10),  # Random integer in log space
    "qlograndint": tune.qlograndint(1, 10, 2),  # Round to multiples of 2
    "choice": tune.choice(["a", "b", "c"]),  # Choose one of these options uniformly
    "func": tune.sample_from(
        lambda spec: spec.config.uniform * 0.01
    ),  # Depends on other value
    "grid": tune.grid_search([32, 64, 128]),  # Search over all these values
}

实验（Ray Trial）

只需要传递trainable和搜参空间，就可以重复多次实验，选出最佳参数。

# Pass in a Trainable class or function, along with a search space "config".
tuner = tune.Tuner(trainable, param_space={"a": 2, "b": 4})
tuner.fit()

搜索算法（Tune Search Algorithms）

没有指定的话，默认随机搜索。ray中支持BayesOptSearch、 HyperOpt 、 Optuna等常用优化算法库，这些算法库基于一定的策略去寻找参数，提高了搜索的效率。

from ray.tune.search.bayesopt import BayesOptSearch
from ray import train

# Define the search space
search_space = {"a": tune.uniform(0, 1), "b": tune.uniform(0, 20)}

algo = BayesOptSearch(random_search_steps=4)

tuner = tune.Tuner(
    trainable,
    tune_config=tune.TuneConfig(
        metric="score",
        mode="min",
        search_alg=algo,
    ),
    run_config=train.RunConfig(stop={"training_iteration": 20}),
    param_space=search_space,
)
tuner.fit()

调度器（Tune Schedulers）

基于每次上报的结果，scheduler决定是否继续或停止响应的trial

from ray.tune.schedulers import HyperBandScheduler

# Create HyperBand scheduler and minimize the score
hyperband = HyperBandScheduler(metric="score", mode="max")

config = {"a": tune.uniform(0, 1), "b": tune.uniform(0, 1)}

tuner = tune.Tuner(
    trainable,
    tune_config=tune.TuneConfig(
        num_samples=20,
        scheduler=hyperband,
    ),
    param_space=config,
)
tuner.fit()

注意，调度器和搜索算法可能并不兼容，调度器有时还需要checkpoint。

结果Tune ResultGrid

可以通过Tune ResultGrid来分析Tune的结果

tuner = tune.Tuner(
    trainable,
    tune_config=tune.TuneConfig(
        metric="score",
        mode="min",
        search_alg=BayesOptSearch(random_search_steps=4),
    ),
    run_config=train.RunConfig(
        stop={"training_iteration": 20},
    ),
    param_space=config,
)
results = tuner.fit()

best_result = results.get_best_result()  # Get best result object
best_config = best_result.config  # Get best trial's hyperparameters
best_logdir = best_result.path  # Get best trial's result directory
best_checkpoint = best_result.checkpoint  # Get best trial's best checkpoint
best_metrics = best_result.metrics  # Get best trial's last results
best_result_df = best_result.metrics_dataframe  # Get best result as pandas dataframe