AutoML: 自动调参工具Ray tune

Ray Tune

Ray Tune 是一个标准的超参数调优工具,包含多种参数搜索算法,并且支持分布式计算,使用方式简单。同时支持pytorch、tensorflow等训练框架,和tensorboard可视化。

超参数

  • 神经网络结构搜索(层数、节点数、类型、连接方式)
  • 学习率
  • optimizer
  • loss weight

使用方法

安装:

pip install ray torchvision

pytorch 集成tune到pipeline

  • class-based ray.tune.Trainable API
  • function-based tune.run API

pytorch class-based ray.tune.Trainable example:

# https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/mnist_pytorch_trainable.py
from __future__ import print_function

import argparse
import os
import torch
import torch.optim as optim

import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.examples.mnist_pytorch import (train, test, get_data_loaders,
                                             ConvNet)

# Change these values if you want the training to run quicker or slower.
EPOCH_SIZE = 512
TEST_SIZE = 256

# Training settings
parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
parser.add_argument(
    "--use-gpu",
    action="store_true",
    default=False,
    help="enables CUDA training")
parser.add_argument(
    "--ray-address", type=str, help="The Redis address of the cluster.")
parser.add_argument(
    "--smoke-test", action="store_true", help="Finish quickly for testing")


# Below comments are for documentation purposes only.
# yapf: disable
# __trainable_example_begin__
class TrainMNIST(tune.Trainable):
    def setup(self, config):
        use_cuda = config.get("use_gpu") and torch.cuda.is_available()
        self.device = torch.device("cuda" if use_cuda else "cpu")
        self.train_loader, self.test_loader = get_data_loaders()
        self.model = ConvNet().to(self.device)
        self.optimizer = optim.SGD(
            self.model.parameters(),
            lr=config.get("lr", 0.01), 
            momentum=config.get("momentum", 0.9))

    def step(self):
        train(
            self.model, self.optimizer, self.train_loader, device=self.device)
        acc = test(self.model, self.test_loader, self.device)
        return {"mean_accuracy": acc}

    def save_checkpoint(self, checkpoint_dir):
        checkpoint_path = os.path.join(checkpoint_dir, "model.pth")
        torch.save(self.model.state_dict(), checkpoint_path)
        return checkpoint_path

    def load_checkpoint(self, checkpoint_path):
        self.model.load_state_dict(torch.load(checkpoint_path))


# __trainable_example_end__
# yapf: enable

if __name__ == "__main__":
    args = parser.parse_args()
    ray.init(address=args.ray_address, num_cpus=6 if args.smoke_test else None)
    sched = ASHAScheduler()
    analysis = tune.run(
        TrainMNIST,
        metric="mean_accuracy", # 最后比较的指标
        mode="max", # 指标越大越好
        scheduler=sched, #指定超参优化器
        stop={
            "mean_accuracy": 0.95,
            "training_iteration": 3 if args.smoke_test else 20,
        },# 设定提前终止条件 
        resources_per_trial={
            "cpu": 3,
            "gpu": int(args.use_gpu)
        }, # 每个trial 需要的资源
        num_samples=1 if args.smoke_test else 20, #运行Trails的数目 
        checkpoint_at_end=True,
        checkpoint_freq=3,
        config={
            "args": args,
            "lr": tune.uniform(0.001, 0.1),
            "momentum": tune.uniform(0.1, 0.9),
        }) #设定搜索的参数空间

    print("Best config is:", analysis.best_config)

python function-based tune.run API


from __future__ import print_function

import argparse
import os
import torch
import torch.optim as optim

import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.examples.mnist_pytorch import (train, test, get_data_loaders,
                                             ConvNet)

def train_mnist(config):
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")
    train_loader, test_loader = get_data_loaders()
    model = ConvNet().to(device)

    optimizer = optim.SGD(
        model.parameters(), lr=config["lr"], momentum=config["momentum"])

    while True:
        train(model, optimizer, train_loader, device)
        acc = test(model, test_loader, device)
        # Set this to run Tune.
        tune.report(mean_accuracy=acc)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
    parser.add_argument(
        "--cuda",
        action="store_true",
        default=False,
        help="Enables GPU training")
    parser.add_argument(
        "--smoke-test", action="store_true", help="Finish quickly for testing")
    parser.add_argument(
        "--ray-address",
        help="Address of Ray cluster for seamless distributed execution.")
    args = parser.parse_args()
    if args.ray_address:
        ray.init(address=args.ray_address)
    else:
        ray.init(num_cpus=2 if args.smoke_test else None)

    # for early stopping
    sched = AsyncHyperBandScheduler()

    analysis = tune.run(
        train_mnist,
        metric="mean_accuracy",
        mode="max",
        name="exp",
        scheduler=sched,
        stop={
            "mean_accuracy": 0.98,
            "training_iteration": 5 if args.smoke_test else 100
        },
        resources_per_trial={
            "cpu": 2,
            "gpu": int(args.cuda)  # set this for GPUs
        },
        num_samples=1 if args.smoke_test else 50,
        config={
            "lr": tune.loguniform(1e-4, 1e-2),
            "momentum": tune.uniform(0.1, 0.9),
        })

    print("Best config is:", analysis.best_config)

ray tune 会根据机器上的资源和设定的每个trial所需资源来运行多个trial(每一组参数跑一个trial 一个尝试).

不同的参数变量产生方式

  • tune.grid_search([0.1, 0.2, 0.3])
  • tune.sample_from(lambda spec: np.random.uniform(100)) 自定义lambda方法
  • tune.loguniform(1e-4, 1e-2)
  • tune.uniform(0.1, 0.9)

不同的搜索算法

支持不同的搜索算法:

  • Grid Search (暴力解法,穷举参数)
  • Random Search (在不确定超参分布时,采用随机搜索https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf)
  • BayesOpt (通过观测一些参数得到的结果,参数与观测值的后验分布,根据已知点的结果预测下一个可能获得最小点的参数,再跑这个参数结果,更新参数分布)
  • HyperOpt (使用LGB做预测器)
  • SigOpt
  • Nevergrad
  • Scikit-Optimize
  • Ax
  • BOHB
from hyperopt import hp
from ray.tune.suggest.hyperopt import HyperOptSearch

space = {
    "lr": hp.loguniform("lr", 1e-10, 0.1),
    "momentum": hp.uniform("momentum", 0.1, 0.9),
}

hyperopt_search = HyperOptSearch(
    space, max_concurrent=2, reward_attr="mean_accuracy")

analysis = tune.run(train_mnist, num_samples=10, search_alg=hyperopt_search)

分布式训练

类似工具

微软NNI(Neural Network Intelligence)

  • hyper-parameter tuning and neural architecture search
  • find good models, which includes good neural architecture, good hyper-parameters, good model compression approach

ray tune:

  • hyper-parameter tuning and reinforcement learning algorithm
  • distributed framework (Tune uses a master-worker architecture to centralise decision-making and communicates)

HyperOpt
Hyperband

还有其他的一些工具 Google Vizier、 Amazon Sagemaker、 facebook Hiplot 参考:
https://analyticsindiamag.com/top-hyperparameter-optimisation-tools-neural-networks/
https://zhuanlan.zhihu.com/p/56730229

工具模型参数搜索模型框架搜索并行支持支持各种深度学习框架强化学习
NNI支持支持支持支持
Google Vizier支持支持支持支持
ray tune支持不支持支持支持支持
Hyteropt支持不支持支持支持

NNI/Google Vizier 偏向神经网络参数和模型结构的自动化搜索。Ray.tune 支持强化学习。Hyperopt偏重超参数搜索。

机器学习数据挖掘方面有一些支持特征搜索筛选的工具:Auto ML、auto_sklean、 Feature Tool

  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值