Ray Tune
Ray Tune 是一个标准的超参数调优工具,包含多种参数搜索算法,并且支持分布式计算,使用方式简单。同时支持pytorch、tensorflow等训练框架,和tensorboard可视化。
超参数
- 神经网络结构搜索(层数、节点数、类型、连接方式)
- 学习率
- optimizer
- loss weight
- …
使用方法
安装:
pip install ray torchvision
pytorch 集成tune到pipeline
- class-based ray.tune.Trainable API
- function-based tune.run API
pytorch class-based ray.tune.Trainable example:
# https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/mnist_pytorch_trainable.py
from __future__ import print_function
import argparse
import os
import torch
import torch.optim as optim
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.examples.mnist_pytorch import (train, test, get_data_loaders,
ConvNet)
# Change these values if you want the training to run quicker or slower.
EPOCH_SIZE = 512
TEST_SIZE = 256
# Training settings
parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
parser.add_argument(
"--use-gpu",
action="store_true",
default=False,
help="enables CUDA training")
parser.add_argument(
"--ray-address", type=str, help="The Redis address of the cluster.")
parser.add_argument(
"--smoke-test", action="store_true", help="Finish quickly for testing")
# Below comments are for documentation purposes only.
# yapf: disable
# __trainable_example_begin__
class TrainMNIST(tune.Trainable):
def setup(self, config):
use_cuda = config.get("use_gpu") and torch.cuda.is_available()
self.device = torch.device("cuda" if use_cuda else "cpu")
self.train_loader, self.test_loader = get_data_loaders()
self.model = ConvNet().to(self.device)
self.optimizer = optim.SGD(
self.model.parameters(),
lr=config.get("lr", 0.01),
momentum=config.get("momentum", 0.9))
def step(self):
train(
self.model, self.optimizer, self.train_loader, device=self.device)
acc = test(self.model, self.test_loader, self.device)
return {"mean_accuracy": acc}
def save_checkpoint(self, checkpoint_dir):
checkpoint_path = os.path.join(checkpoint_dir, "model.pth")
torch.save(self.model.state_dict(), checkpoint_path)
return checkpoint_path
def load_checkpoint(self, checkpoint_path):
self.model.load_state_dict(torch.load(checkpoint_path))
# __trainable_example_end__
# yapf: enable
if __name__ == "__main__":
args = parser.parse_args()
ray.init(address=args.ray_address, num_cpus=6 if args.smoke_test else None)
sched = ASHAScheduler()
analysis = tune.run(
TrainMNIST,
metric="mean_accuracy", # 最后比较的指标
mode="max", # 指标越大越好
scheduler=sched, #指定超参优化器
stop={
"mean_accuracy": 0.95,
"training_iteration": 3 if args.smoke_test else 20,
},# 设定提前终止条件
resources_per_trial={
"cpu": 3,
"gpu": int(args.use_gpu)
}, # 每个trial 需要的资源
num_samples=1 if args.smoke_test else 20, #运行Trails的数目
checkpoint_at_end=True,
checkpoint_freq=3,
config={
"args": args,
"lr": tune.uniform(0.001, 0.1),
"momentum": tune.uniform(0.1, 0.9),
}) #设定搜索的参数空间
print("Best config is:", analysis.best_config)
python function-based tune.run API
from __future__ import print_function
import argparse
import os
import torch
import torch.optim as optim
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.examples.mnist_pytorch import (train, test, get_data_loaders,
ConvNet)
def train_mnist(config):
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
train_loader, test_loader = get_data_loaders()
model = ConvNet().to(device)
optimizer = optim.SGD(
model.parameters(), lr=config["lr"], momentum=config["momentum"])
while True:
train(model, optimizer, train_loader, device)
acc = test(model, test_loader, device)
# Set this to run Tune.
tune.report(mean_accuracy=acc)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
parser.add_argument(
"--cuda",
action="store_true",
default=False,
help="Enables GPU training")
parser.add_argument(
"--smoke-test", action="store_true", help="Finish quickly for testing")
parser.add_argument(
"--ray-address",
help="Address of Ray cluster for seamless distributed execution.")
args = parser.parse_args()
if args.ray_address:
ray.init(address=args.ray_address)
else:
ray.init(num_cpus=2 if args.smoke_test else None)
# for early stopping
sched = AsyncHyperBandScheduler()
analysis = tune.run(
train_mnist,
metric="mean_accuracy",
mode="max",
name="exp",
scheduler=sched,
stop={
"mean_accuracy": 0.98,
"training_iteration": 5 if args.smoke_test else 100
},
resources_per_trial={
"cpu": 2,
"gpu": int(args.cuda) # set this for GPUs
},
num_samples=1 if args.smoke_test else 50,
config={
"lr": tune.loguniform(1e-4, 1e-2),
"momentum": tune.uniform(0.1, 0.9),
})
print("Best config is:", analysis.best_config)
ray tune 会根据机器上的资源和设定的每个trial所需资源来运行多个trial(每一组参数跑一个trial 一个尝试).
不同的参数变量产生方式
- tune.grid_search([0.1, 0.2, 0.3])
- tune.sample_from(lambda spec: np.random.uniform(100)) 自定义lambda方法
- tune.loguniform(1e-4, 1e-2)
- tune.uniform(0.1, 0.9)
- …
不同的搜索算法
- Grid Search (暴力解法,穷举参数)
- Random Search (在不确定超参分布时,采用随机搜索https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf)
- BayesOpt (通过观测一些参数得到的结果,参数与观测值的后验分布,根据已知点的结果预测下一个可能获得最小点的参数,再跑这个参数结果,更新参数分布)
- HyperOpt (使用LGB做预测器)
- SigOpt
- Nevergrad
- Scikit-Optimize
- Ax
- BOHB
from hyperopt import hp
from ray.tune.suggest.hyperopt import HyperOptSearch
space = {
"lr": hp.loguniform("lr", 1e-10, 0.1),
"momentum": hp.uniform("momentum", 0.1, 0.9),
}
hyperopt_search = HyperOptSearch(
space, max_concurrent=2, reward_attr="mean_accuracy")
analysis = tune.run(train_mnist, num_samples=10, search_alg=hyperopt_search)
分布式训练
类似工具
微软NNI(Neural Network Intelligence)
- hyper-parameter tuning and neural architecture search
- find good models, which includes good neural architecture, good hyper-parameters, good model compression approach
ray tune:
- hyper-parameter tuning and reinforcement learning algorithm
- distributed framework (Tune uses a master-worker architecture to centralise decision-making and communicates)
HyperOpt
Hyperband
还有其他的一些工具 Google Vizier、 Amazon Sagemaker、 facebook Hiplot 参考:
https://analyticsindiamag.com/top-hyperparameter-optimisation-tools-neural-networks/
https://zhuanlan.zhihu.com/p/56730229
工具 | 模型参数搜索 | 模型框架搜索 | 并行支持 | 支持各种深度学习框架 | 强化学习 |
---|---|---|---|---|---|
NNI | 支持 | 支持 | 支持 | 支持 | |
Google Vizier | 支持 | 支持 | 支持 | 支持 | |
ray tune | 支持 | 不支持 | 支持 | 支持 | 支持 |
Hyteropt | 支持 | 不支持 | 支持 | 支持 |
NNI/Google Vizier 偏向神经网络参数和模型结构的自动化搜索。Ray.tune 支持强化学习。Hyperopt偏重超参数搜索。
机器学习数据挖掘方面有一些支持特征搜索筛选的工具:Auto ML、auto_sklean、 Feature Tool