lightgbm自动调参

使用optuna自动调参框架对lgb的超参进行优化
设置超参数范围
设置调参要运行的方法
使用optuna进行调参
输出最优参数,使用最优参数进行调参
保存调参记录

主要的调参原理如下:
1 采样算法
利用 suggested 参数值和评估的目标值的记录,采样器基本上不断缩小搜索空间,直到找到一个最佳的搜索空间,其产生的参数会带来更好的目标函数值。
2 剪枝算法
自动在训练的早期(也就是自动化的 early-stopping)终止无望的 trial
完整代码和步骤
import lightgbm as lgb
import numpy as np
import optuna
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
import pandas as pd
def pdReadCsv(file, sep):
try:
data = pd.read_csv(file, sep=sep, encoding=‘utf-8’, error_bad_lines=False, engine=‘python’)
return data
except:
try:
data = pd.read_csv(file, sep=sep, encoding=‘gb18030’, error_bad_lines=False, engine=‘python’)
return data
except:
data = pd.read_csv(file, sep=sep, encoding=‘gbk’, error_bad_lines=False, engine=‘python’)
return data

src = ‘models\’

K = 5
seed = 1234
skf = KFold(n_splits=K, shuffle=True, random_state=seed)

train_path = r’data.csv’
df_train = pdReadCsv(train_path, ‘,’)

def load_data():
X_train = df_train.drop(‘度’, axis=1)
y_train = df_train[‘度’]
return X_train, y_train, X_train, y_train

def objective(trial):
params = {‘nthread’: -1, ‘max_depth’: trial.suggest_int(‘max_depth’, 10, 100),
‘learning_rate’: trial.suggest_uniform(‘learning_rate’, 0, 1),
‘bagging_fraction’: trial.suggest_uniform(‘bagging_fraction’, 0, 1),
‘num_leaves’: trial.suggest_int(‘num_leaves’, 10, 100), ‘objective’: ‘regression’,
‘feature_fraction’: trial.suggest_uniform(‘feature_fraction’, 0, 1), ‘lambda_l1’: 0,
‘lambda_l2’: 0, ‘bagging_seed’: 100, ‘metric’: [‘rmse’]}
oof1 = np.zeros(len(X_train))
for n, (train_index, val_index) in enumerate(skf.split(X_train, y_train)):
print(“fold {}”.format(n))
X_tr, X_val = X_train.iloc[train_index], X_train.iloc[val_index]
y_tr, y_val = y_train.iloc[train_index], y_train.iloc[val_index]
lgb_train = lgb.Dataset(X_tr, y_tr)
lgb_val = lgb.Dataset(X_val, y_val)
clf = lgb.train(params, lgb_train, num_boost_round=88, valid_sets=[lgb_train, lgb_val])
oof1[val_index] = clf.predict(X_val, num_iteration=clf.best_iteration)
mse = mean_squared_error(y_train, oof1)
print('train_score : ', mse)
return mse

X_train, y_train, X_test, y_test = load_data()
study = optuna.create_study(direction=‘minimize’)
n_trials = 89
study.optimize(objective, n_trials=n_trials)
print(study.best_value)
best = study.best_params
df = study.trials_dataframe(attrs=(‘number’, ‘value’, ‘params’, ‘state’))
df.to_csv(‘params.csv’)
print(best)
params = {‘nthread’: -1, ‘max_depth’: best[‘max_depth’],
‘learning_rate’: best[‘learning_rate’], ‘bagging_fraction’: best[‘bagging_fraction’],
‘num_leaves’: best[‘num_leaves’], ‘objective’: ‘regression’,
‘feature_fraction’: best[‘feature_fraction’], ‘lambda_l1’: 0,
‘lambda_l2’: 0, ‘bagging_seed’: 100, ‘metric’: [‘rmse’]}
train_index = np.arange(0, X_train.shape[0] - 10)
val_index = np.arange(X_train.shape[0] - 10, X_train.shape[0])
X_tr, X_val = X_train.iloc[train_index], X_train.iloc[val_index]
y_tr, y_val = y_train.iloc[train_index], y_train.iloc[val_index]
lgb_train = lgb.Dataset(X_tr, y_tr)
lgb_val = lgb.Dataset(X_val, y_val)
clf = lgb.train(params, lgb_train, num_boost_round=88, valid_sets=[lgb_train, lgb_val])
oof1 = np.zeros(len(X_train))
oof1[val_index] = clf.predict(X_val, num_iteration=clf.best_iteration)
mse = mean_squared_error(y_train, oof1)
print('train_score : ', mse)
clf.save_model(src + ‘ane.pkl’)
保存超参数调优记录,加在study调参完成之后
df = study.trials_dataframe(attrs=(‘number’, ‘value’, ‘params’, ‘state’))
df.to_csv(‘params.csv’)

  • 15
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值