lgb训练模型后使用AutoML-NNI对工艺参数调参优化

自动机器学习平台NNI 同时被 3 个专栏收录
5 篇文章 0 订阅
3 篇文章 0 订阅
3 篇文章 1 订阅

1、摘要

本文主要讲解:lgb训练模型,保存模型,将训练特征作为需要调优的参数,使用AutoML-NNI对训练特征参数调参优化
主要思路:

  1. 使用lgb训练模型,保存模型
  2. 将训练特征作为需要调优的参数,使用保存的模型对这个参数进行预测
  3. 将预测的结果作为nni的最佳指标,当最佳指标为最优的时候,即为最优的工艺参数

温馨提示:此方法仅适用于回归问题,并且标签是有价值的,应该在工艺参数优化问题上使用

2、数据介绍

数据介绍请参考我的另一篇文章,使用的是同一份数据,文末有附学习链接,在链接的文章有下载链接

3、相关技术

工艺参数优化
工艺参数优化指的是对生产制造领域所需的工艺参数进行优化,在大部分制造业都存在工艺参数优化问题,例如注塑行业需要对保压压力、时间等参数进行优化,以提高注塑品的质量,防止注塑品异常。

未来展望
目前用机器学习模型拟合工艺过程,并用NNI求优,但一方面调优算法具有随机性
每次寻优结果都不同,另一方面无法解决通过时序性的产品尺寸(即输出)来优化工艺参数(即输入)

应该是根据y(即输出)来寻找x(即输入),解一般都是很多组,很难找到最优解,加额外的约束条件

有研究相同问题的朋友请私聊我,我们共同用AI去解决这个问题。

具体参数与优化方法介绍参考文末的链接

4、完整代码和步骤

工艺参数优化的界面如下,最佳指即为y,我们可以根据最佳指标找到对应的工艺参数
在这里插入图片描述

训练模型,保存模型的代码如下:

'''
This project is for automatically tuning parameters for GBDT.

Trial(尝试) 是将一组参数组合(例如,超参)在模型上独立的一次尝试。
定义 NNI 的 Trial,需要首先定义参数组(例如,搜索空间),并更新模型代码。
nnictl create --config config.yml -p 8888
'''
import logging

import lightgbm as lgb
import nni
import pandas as pd
from sklearn.metrics import mean_squared_error


LOG = logging.getLogger('auto-gbdt')

# specify your configurations as a dict  已完成调参
def get_default_parameters():
    # params = {
    #     'boosting_type': 'gbdt',
    #     'objective': 'regression',
    #     'metric': {'l2', 'auc'},
    #     'num_leaves': 31,
    #     'learning_rate': 0.05,
    #     'feature_fraction': 0.9,
    #     'bagging_fraction': 0.8,
    #     'bagging_freq': 5,
    #     'verbose': 0
    # }
    # The rmse of prediction is: 0.450357
    params = {
        'boosting_type': 'gbdt',
        'objective': 'regression',
        'metric': {'l2', 'auc'},
        'num_leaves': 28,
        'learning_rate': 0.2,
        'feature_fraction': 0.930,
        'bagging_fraction': 0.7656,
        'bagging_freq': 4,
        'verbose': 0
    }
    # The rmse of prediction is: 0.419561
    return params


def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
    '''
    Load or create dataset
    '''
    print('Load data...')
    df_train = pd.read_csv(train_path, header=None, sep='\t')
    df_test = pd.read_csv(test_path, header=None, sep='\t')
    num = len(df_train)
    split_num = int(0.9 * num)

    y_train = df_train[0].values
    y_test = df_test[0].values
    y_eval = y_train[split_num:]
    y_train = y_train[:split_num]

    X_train = df_train.drop(0, axis=1).values
    X_test = df_test.drop(0, axis=1).values
    X_eval = X_train[split_num:, :]
    X_train = X_train[:split_num, :]

    # create dataset for lightgbm
    lgb_train = lgb.Dataset(X_train, y_train)
    lgb_eval = lgb.Dataset(X_eval, y_eval, reference=lgb_train)

    return lgb_train, lgb_eval, X_test, y_test

def run(lgb_train, lgb_eval, params, X_test, y_test):
    print('Start training...')

    params['num_leaves'] = int(params['num_leaves'])

    # train
    gbm = lgb.train(params,
                    lgb_train,
                    num_boost_round=20,
                    valid_sets=lgb_eval,
                    early_stopping_rounds=5)

    print('Start predicting...')
    # predict
    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
    gbm.save_model('gbm_model.txt')

    # eval
    rmse = mean_squared_error(y_test, y_pred) ** 0.5
    print('The rmse of prediction is:', rmse)

    nni.report_final_result(rmse)

if __name__ == '__main__':
    lgb_train, lgb_eval, X_test, y_test = load_data()

    try:
        # get parameters from tuner
        RECEIVED_PARAMS = nni.get_next_parameter()
        LOG.debug(RECEIVED_PARAMS)
        PARAMS = get_default_parameters()
        PARAMS.update(RECEIVED_PARAMS)
        LOG.debug(PARAMS)

        # train
        run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
    except Exception as exception:
        LOG.exception(exception)
        raise

将训练特征数据转化为参数的方法:
build_search_space.py

import json

import pandas as pd

'''
{
    "num_leaves":{"_type":"randint","_value":[20, 31]},
    "learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]},
    "bagging_fraction":{"_type":"uniform","_value":[0.7, 1.0]},
    "feature_fraction":{"_type":"uniform","_value":[0.7, 1.0]},
    "bagging_freq":{"_type":"choice","_value":[1, 2, 4, 8, 10]}
}

{
        'boosting_type': 'gbdt',
        'objective': 'regression',
        'metric': {'l2', 'auc'},
        'num_leaves': 28,
        'learning_rate': 0.2,
        'feature_fraction': 0.930,
        'bagging_fraction': 0.7656,
        'bagging_freq': 4,
        'verbose': 0
    }
'''

def writeJson(relate_record, src):
    Json_str = json.dumps(relate_record, ensure_ascii=False)
    with open(src, 'w') as Json_file:
        Json_file.write(Json_str)
    Json_file.close()

# 将所有特征范围作为调参的范围
def load_data(train_path='./data/regression.train'):
    colums = ['label']
    for i in range(28):
        colums.append('col' + str(i))
    df_train = pd.read_csv(train_path, header=None, names=colums, sep='\t')
    search = {}
    for name in colums:
        search[name] = {"_type": "uniform", "_value": [min(df_train[name]), max(df_train[name])]}
    writeJson(search, 'default_parameters.json')
    # writeJson(search, 'search_space1.json')


# load_data()


# 获取某一个参数
def load_default_parameters(train_path='./data/regression.train'):
    colums = ['label']
    for i in range(28):
        colums.append('col' + str(i))
    df_train = pd.read_csv(train_path, header=None, names=colums, sep='\t')
    search = {}
    for name in colums[1:]:
        search[name] = min(df_train[name])
    writeJson(search, 'default_parameters.json')
    # writeJson(search, 'search_space1.json')


load_default_parameters()

工艺参数调参优化的代码如下:
main1.py

import logging

import lightgbm as lgb
import nni
import numpy as np

LOG = logging.getLogger('auto-gbdt')

def get_default_parameters():
    params = {"col0": 0.275, "col1": -2.417, "col2": -1.743, "col3": 0.019, "col4": -1.743, "col5": 0.159,
              "col6": -2.941000, "col7": -1.7409999, "col8": 0.0, "col9": 0.19, "col10": -2.904,
              "col11": -1.742, "col12": 0.0, "col13": 0.264, "col14": -2.728, "col15": -1.742, "col16": 0.0,
              "col17": 0.365, "col18": -2.495, "col19": -1.74, "col20": 0.0, "col21": 0.172, "col22": 0.419,
              "col23": 0.461, "col24": 0.384, "col25": 0.093000, "col26": 0.389, "col27": 0.489}
    return params


def run(params):
    print('Start training...')
    X_test = np.array(list(params.values())).reshape(1, 28)

    print('Start predicting...')
    gbm = lgb.Booster(model_file='gbm_model.txt')
    # predict
    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)

    nni.report_final_result(round(y_pred[0], 5))


if __name__ == '__main__':
    try:
        # get parameters from tuner
        RECEIVED_PARAMS = nni.get_next_parameter()
        LOG.debug(RECEIVED_PARAMS)
        PARAMS = get_default_parameters()
        PARAMS.update(RECEIVED_PARAMS)
        LOG.debug(PARAMS)

        # train
        run(PARAMS)
    except Exception as exception:
        LOG.exception(exception)
        raise

config.yml

authorName: default
experimentName: example_auto-gbdt
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 100
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space1.json
#choice: true, false
useAnnotation: false
tuner:
  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner, GPTuner
  #SMAC (SMAC should be installed through nnictl)
  builtinTunerName: TPE
  classArgs:
    #choice: maximize, minimize
    optimize_mode: minimize
trial:
  command: python main1.py
  codeDir: .
  gpuNum: 0

search_space1.json

{"col0": {"_type": "uniform", "_value": [0.275, 6.695]}, "col1": {"_type": "uniform", "_value": [-2.417, 2.43]}, "col2": {"_type": "uniform", "_value": [-1.743, 1.743]}, "col3": {"_type": "uniform", "_value": [0.019, 5.7]}, "col4": {"_type": "uniform", "_value": [-1.743, 1.743]}, "col5": {"_type": "uniform", "_value": [0.159, 4.19]}, "col6": {"_type": "uniform", "_value": [-2.9410000000000003, 2.97]}, "col7": {"_type": "uniform", "_value": [-1.7409999999999999, 1.7409999999999999]}, "col8": {"_type": "uniform", "_value": [0.0, 2.173]}, "col9": {"_type": "uniform", "_value": [0.19, 5.193]}, "col10": {"_type": "uniform", "_value": [-2.904, 2.909]}, "col11": {"_type": "uniform", "_value": [-1.742, 1.743]}, "col12": {"_type": "uniform", "_value": [0.0, 2.215]}, "col13": {"_type": "uniform", "_value": [0.264, 6.523]}, "col14": {"_type": "uniform", "_value": [-2.728, 2.727]}, "col15": {"_type": "uniform", "_value": [-1.742, 1.742]}, "col16": {"_type": "uniform", "_value": [0.0, 2.548]}, "col17": {"_type": "uniform", "_value": [0.365, 6.068]}, "col18": {"_type": "uniform", "_value": [-2.495, 2.496]}, "col19": {"_type": "uniform", "_value": [-1.74, 1.743]}, "col20": {"_type": "uniform", "_value": [0.0, 3.102]}, "col21": {"_type": "uniform", "_value": [0.172, 13.097999999999999]}, "col22": {"_type": "uniform", "_value": [0.419, 7.392]}, "col23": {"_type": "uniform", "_value": [0.461, 3.682]}, "col24": {"_type": "uniform", "_value": [0.384, 6.582999999999999]}, "col25": {"_type": "uniform", "_value": [0.09300000000000001, 7.86]}, "col26": {"_type": "uniform", "_value": [0.389, 4.543]}, "col27": {"_type": "uniform", "_value": [0.489, 4.316]}}

运命令:

nnictl create --config config.yml -p 8888 --debug

停止命令(调参完了不停止会消耗部分计算资源):

nnictl stop

5、学习链接

注塑工艺中的注塑工艺参数设定与优化

基于强化学习的仿人智能控制器参数在线学习与优化

AutoML-NNI中TPE对lgb算法的超参调参并优化

  • 1
    点赞
  • 1
    评论
  • 2
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值