ray.tune调参学习笔记0：调参基本流程

Mr_111000

已于 2024-04-25 20:45:44 修改

阅读量1.1k

点赞数 9

分类专栏： ray.tune调参学习笔记文章标签： python 机器学习

于 2024-03-05 17:46:24 首次发布

本文链接：https://blog.csdn.net/mr_111000/article/details/136482156

版权

ray.tune调参学习笔记专栏收录该内容

3 篇文章 1 订阅

订阅专栏

最近研究中学习使用python的ray.tune进行神经网络调参。在这里记录学习过程中的收获，希望能够帮助到有同样需求的人。学习过程主要参考ray官网文档，但由于笔者使用的ray为2.2.0版本，而官方文档为更高级版本，笔者代码和官方文档代码存在一定差异，具体以实际版本为准。

本篇总结Python中使用tune调参的整体流程，在后续文章中对各环节进行详细介绍。

ray.tune调参学习笔记1：超参数优化器tuner设置

1 Ray.tune基本介绍

tune是一个用于机器学习实验调度及超参数调整的python库，其支持使用先进的超参数优化方法（PBT、ASHB等）实现不同主流框架（Pytorch、Tensorflow、Keras等）的机器学习模型的并行训练及超参数调整。

tune中一次完整的超参数搜索称为一个experiment，一个experimrnt中包含多个trial，一个trial即固定一组超参数下的模型训练过程。

2 tune调参的基本流程

只要定义合适的API，即可采用相同的tune调参流程实现不同模型的调参，基本的调参流程如下:

(1) 定义模型训练API

定义用于实现模型训练的函数（trainable）。以函数形式的API为例，以字典格式的配置文件（即需要调整的超参数机及其取值）作为输入参数以确定模型参数，使用report返回训练结果指标。代码框架如下。其中用于读取数据的函数load_data()、用于一次epoch训练的函数train(）、测试函数test()及其返回指标metric可根据具体使用需求及采用的机器学习框架自由定义。

from ray import tune

def trainable(config):
    train_data, test_data = load_data()  # 读取训练及测试数据
    model, optimizer = create_model(config) # 创建模型及优化器
    while True:
        train(model,optimizer, train_data)  # 模型训练
        metric = test(model, test_data)  # 模型测试
        tune.report(metric=metric)  # 返回结果指标

(2) 创建超参数优化器并进行超参数搜索

创建用于调参的优化器（turner），其输入参数包括模型训练函数trainable，超参数搜索空间parame_space，优化算法配置tune_config，及运行配置run_config。其中tune_config指定参数搜索相关的的优化算法，度量指标等。run_config指定训练终止条件，check point配置，运行结果存储路径等。具体配置方法在后续文章详细介绍。

之后使用tuner.fit()即可根据输入配置实现超参数搜索。

tuner = tune.Tuner(trainable, para_space, tune_config, run_config)  # 定义tuner
tuner.fit()  # 进行超参数搜索

(3) 对结果进行分析

执行tuner.fit()时会自动输出运行信息，也可以利用tune提供的API对超参数搜索结果进行进一步分析，具体方法在后续文章中详细介绍。

3 tune调参样例

本节给出一个简单的使用tune调参的样例，用于和上述基本流程进行对照。

(1) 定义模型训练API

本样例训练一个简单的全连接神经网络模型用于iris数据集分类。神经网络模型基于pytorch搭建，待优化参数为隐藏层的神经元数。

import torch
import tempfile
from ray import tune, train
from ray.air.config import RunConfig
from ray.tune.schedulers import AsyncHyperBandScheduler
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split


# 基于pytorch构建全连接神经网络
class myANN(nn.Module):
    def __init__(self, inputSize, hiddenLayer, outputSize):
        super(myANN, self).__init__()
        self.fc1 = nn.Linear(inputSize, hiddenLayer)
        self.fc2 = nn.Linear(hiddenLayer, hiddenLayer)
        self.output = nn.Linear(hiddenLayer, outputSize)
        self.active = nn.ReLU()

    def forward(self, x):
        out = self.active(self.fc1(x))
        out = self.active(self.fc2(out))
        out = self.output(out)
        return out


# 从sklearn中读取iris数据集
def get_data_loaders():
    iris_data = load_iris()
    # print(iris_data.shape)
    x = iris_data.data
    y = iris_data.target
    test_size = 0.2
    train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=test_size, stratify=y)  # 划分训练及验证数据
    scalar = MinMaxScaler(feature_range=(0, 1))
    train_x_N = scalar.fit_transform(train_x)
    test_x_N = scalar.transform(test_x)          # 归一化
    train_x_N = torch.Tensor(train_x_N)
    train_y = torch.LongTensor(train_y)
    train_dataset = TensorDataset(train_x_N, train_y)
    trainDL = DataLoader(train_dataset, batch_size=16)
    test_x_N = torch.Tensor(test_x_N)
    test_y = torch.LongTensor(test_y)
    test_dataset = TensorDataset(test_x_N, test_y)
    testDL = DataLoader(test_dataset, batch_size=16)
    return trainDL, testDL


# 一次epoch的训练函数
def train_func(model, optimizer, train_loader, device):
    device = device
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()


# 测试函数，返回准确率
def test_func(model, data_loader, device):
    device = device
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(data_loader):
            data, target = data.to(device), target.to(device)
            outputs = model(data)
            _, predicted = torch.max(outputs.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
    return correct / total



# 训练函数，待调参数为神经网络隐藏层的神经元数hiddenLayer
def train_iris(config: dict):
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")
    train_loader, test_loader = get_data_loaders()   # 读取数据集
    model = myANN(inputSize=4, hiddenLayer=config["hiddenLayer"], outputSize=3)  # 定义模型，hiddenLayer为待调整超参数，从config获取参数
    model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, betas=(0.5, 0.999))

    while True:
        train_func(model, optimizer, train_loader, device)
        acc = test_func(model, test_loader, device)
        tune.report(acc=acc)   # 将准确率作为指标返回

(2) 创建超参数优化器并搜索

if __name__ == '__main__':
    # 参数搜索空间,在16，32，64中选择hiddenLayer
    myConfig = {
        "hiddenLayer": tune.grid_search([16, 32, 64])
    }

    sched = AsyncHyperBandScheduler()   # 采用的优化方法
    resources_per_trial = {"cpu": 2, "gpu": 1}  # 分配调参时的计算资源
    # 创建参数优化器
    tuner = tune.Tuner(
        tune.with_resources(train_iris, resources=resources_per_trial),
        tune_config=tune.TuneConfig(
            metric="acc",
            mode="max",
            scheduler=sched,
        ),
        run_config=RunConfig(
            name="TuneTest",
            local_dir="./rayResults",
            stop={
                "acc": 0.98,
                "training_iteration": 50,
            },
        ),
        param_space=myConfig,
    )
    # 进行参数优化
    results = tuner.fit()

运行后会自动输出结果

(3) 对结果进行分析

读取保存的运行结果，输出准确率最高的模型的参数，并绘制准确率最高的模型的训练曲线

from ray import tune
import matplotlib.pyplot as plt

if __name__ == '__main__':
    storagePath = "./rayResults/TuneTest"
    tuner = tune.Tuner.restore(path=storagePath)
    res = tuner.get_results()
    bestResult = res.get_best_result(metric="acc", mode="max")
    print(bestResult.config)
    bestResult.metrics_dataframe.plot("training_iteration", "acc")
    plt.show()

最优模型参数