AutoML: 自动调参工具Ray tune

最新推荐文章于 2024-06-08 09:50:28 发布

啥哈哈哈

最新推荐文章于 2024-06-08 09:50:28 发布

阅读量1.4k

点赞数

分类专栏：工具学习深度学习机器学习

本文链接：https://blog.csdn.net/cuifan0814/article/details/115691505

版权

深度学习同时被 3 个专栏收录

12 篇文章 0 订阅

订阅专栏

机器学习

12 篇文章 0 订阅

订阅专栏

工具学习

5 篇文章 0 订阅

订阅专栏

Ray Tune

Ray Tune 是一个标准的超参数调优工具，包含多种参数搜索算法，并且支持分布式计算，使用方式简单。同时支持pytorch、tensorflow等训练框架，和tensorboard可视化。

超参数

神经网络结构搜索（层数、节点数、类型、连接方式）
学习率
optimizer
loss weight
…

使用方法

安装：

pip install ray torchvision

pytorch 集成tune到pipeline

class-based ray.tune.Trainable API
function-based tune.run API

pytorch class-based ray.tune.Trainable example:

# https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/mnist_pytorch_trainable.py
from __future__ import print_function

import argparse
import os
import torch
import torch.optim as optim

import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.examples.mnist_pytorch import (train, test, get_data_loaders,
                                             ConvNet)

# Change these values if you want the training to run quicker or slower.
EPOCH_SIZE = 512
TEST_SIZE = 256

# Training settings
parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
parser.add_argument(
    "--use-gpu",
    action="store_true",
    default=False,
    help="enables CUDA training")
parser.add_argument(
    "--ray-address", type=str, help="The Redis address of the cluster.")
parser.add_argument(
    "--smoke-test", action="store_true", help="Finish quickly for testing")


# Below comments are for documentation purposes only.
# yapf: disable
# __trainable_example_begin__
class TrainMNIST(tune.Trainable):
    def setup(self, config):
        use_cuda = config.get("use_gpu") and torch.cuda.is_available()
        self.device = torch.device("cuda" if use_cuda else "cpu")
        self.train_loader, self.test_loader = get_data_loaders()
        self.model = ConvNet().to(self.device)
        self.optimizer = optim.SGD(
            self.model.parameters(),
            lr=config.get("lr", 0.01), 
            momentum=config.get("momentum", 0.9))

    def step(self):
        train(
            self.model, self.optimizer, self.train_loader, device=self.device)
        acc = test(self.model, self.test_loader, self.device)
        return {"mean_accuracy": acc}

    def save_checkpoint(self, checkpoint_dir):
        checkpoint_path = os.path.join(checkpoint_dir, "model.pth")
        torch.save(self.model.state_dict(), checkpoint_path)
        return checkpoint_path

    def load_checkpoint(self, checkpoint_path):
        self.model.load_state_dict(torch.load(checkpoint_path))


# __trainable_example_end__
# yapf: enable

if __name__ == "__main__":
    args = parser.parse_args()
    ray.init(address=args.ray_address, num_cpus=6 if args.smoke_test else None)
    sched = ASHAScheduler()
    analysis = tune.run(
        TrainMNIST,
        metric="mean_accuracy", # 最后比较的指标
        mode="max", # 指标越大越好
        scheduler=sched, #指定超参优化器
        stop={
            "mean_accuracy": 0.95,
            "training_iteration": 3 if args.smoke_test else 20,
        },# 设定提前终止条件 
        resources_per_trial={
            "cpu": 3,
            "gpu": int(args.use_gpu)
        }, # 每个trial 需要的资源
        num_samples=1 if args.smoke_test else 20, #运行Trails的数目 
        checkpoint_at_end=True,
        checkpoint_freq=3,
        config={
            "args": args,
            "lr": tune.uniform(0.001, 0.1),
            "momentum": tune.uniform(0.1, 0.9),
        }) #设定搜索的参数空间

    print("Best config is:", analysis.best_config)

python function-based tune.run API


from __future__ import print_function

import argparse
import os
import torch
import torch.optim as optim

import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.examples.mnist_pytorch import (train, test, get_data_loaders,
                                             ConvNet)

def train_mnist(config):
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")
    train_loader, test_loader = get_data_loaders()
    model = ConvNet().to(device)

    optimizer = optim.SGD(
        model.parameters(), lr=config["lr"], momentum=config["momentum"])

    while True:
        train(model, optimizer, train_loader, device)
        acc = test(model, test_loader, device)
        # Set this to run Tune.
        tune.report(mean_accuracy=acc)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
    parser.add_argument(
        "--cuda",
        action="store_true",
        default=False,
        help="Enables GPU training")
    parser.add_argument(
        "--smoke-test", action="store_true", help="Finish quickly for testing")
    parser.add_argument(
        "--ray-address",
        help="Address of Ray cluster for seamless distributed execution.")
    args = parser.parse_args()
    if args.ray_address:
        ray.init(address=args.ray_address)
    else:
        ray.init(num_cpus=2 if args.smoke_test else None)

    # for early stopping
    sched = AsyncHyperBandScheduler()

    analysis = tune.run(
        train_mnist,
        metric="mean_accuracy",
        mode="max",
        name="exp",
        scheduler=sched,
        stop={
            "mean_accuracy": 0.98,
            "training_iteration": 5 if args.smoke_test else 100
        },
        resources_per_trial={
            "cpu": 2,
            "gpu": int(args.cuda)  # set this for GPUs
        },
        num_samples=1 if args.smoke_test else 50,
        config={
            "lr": tune.loguniform(1e-4, 1e-2),
            "momentum": tune.uniform(0.1, 0.9),
        })

    print("Best config is:", analysis.best_config)

ray tune 会根据机器上的资源和设定的每个trial所需资源来运行多个trial（每一组参数跑一个trial 一个尝试）.

不同的参数变量产生方式

tune.grid_search([0.1, 0.2, 0.3])
tune.sample_from(lambda spec: np.random.uniform(100)) 自定义lambda方法
tune.loguniform(1e-4, 1e-2)
tune.uniform(0.1, 0.9)
…

不同的搜索算法

支持不同的搜索算法:

Grid Search (暴力解法，穷举参数)
Random Search （在不确定超参分布时，采用随机搜索https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf）
BayesOpt (通过观测一些参数得到的结果，参数与观测值的后验分布，根据已知点的结果预测下一个可能获得最小点的参数，再跑这个参数结果，更新参数分布)
HyperOpt （使用LGB做预测器）
SigOpt
Nevergrad
Scikit-Optimize
Ax
BOHB

from hyperopt import hp
from ray.tune.suggest.hyperopt import HyperOptSearch

space = {
    "lr": hp.loguniform("lr", 1e-10, 0.1),
    "momentum": hp.uniform("momentum", 0.1, 0.9),
}

hyperopt_search = HyperOptSearch(
    space, max_concurrent=2, reward_attr="mean_accuracy")

analysis = tune.run(train_mnist, num_samples=10, search_alg=hyperopt_search)

分布式训练

类似工具

微软NNI（Neural Network Intelligence）

hyper-parameter tuning and neural architecture search
find good models, which includes good neural architecture, good hyper-parameters, good model compression approach

ray tune：

hyper-parameter tuning and reinforcement learning algorithm
distributed framework （Tune uses a master-worker architecture to centralise decision-making and communicates）

HyperOpt
Hyperband

还有其他的一些工具 Google Vizier、 Amazon Sagemaker、 facebook Hiplot 参考：
https://analyticsindiamag.com/top-hyperparameter-optimisation-tools-neural-networks/
https://zhuanlan.zhihu.com/p/56730229

工具	模型参数搜索	模型框架搜索	并行支持	支持各种深度学习框架	强化学习
NNI	支持	支持	支持	支持
Google Vizier	支持	支持	支持	支持
ray tune	支持	不支持	支持	支持	支持
Hyteropt	支持	不支持	支持	支持

NNI/Google Vizier 偏向神经网络参数和模型结构的自动化搜索。Ray.tune 支持强化学习。Hyperopt偏重超参数搜索。

机器学习数据挖掘方面有一些支持特征搜索筛选的工具：Auto ML、auto_sklean、 Feature Tool

啥哈哈哈

关注

0
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
AutoML: 自动调参工具Ray tune

Ray TuneRay Tune 是一个标准的超参数调优工具，包含多种参数搜索算法，并且支持分布式计算，使用方式简单。同时支持pytorch、tensorflow等训练框架，和tensorboard可视化。超参数神经网络结构搜索（层数、节点数、类型、连接方式）学习率optimizerloss weight…使用方法安装：pip install ray torchvisionpytorch 集成tune到pipelineclass-based ray.tune.Trainabl
复制链接

扫一扫