对于LGBM来说可行的优化算法

最新推荐文章于 2024-07-20 09:21:06 发布

王摇摆

最新推荐文章于 2024-07-20 09:21:06 发布

阅读量345

点赞数 14

文章标签：算法人工智能 python

本文链接：https://blog.csdn.net/weixin_44943389/article/details/140135182

版权

除了熵权法（Entropy Weight Method, EWM）以外，还有许多其他方法可以用来优化LightGBM（LGBM）模型。以下是一些常见的优化方法：

1. 网格搜索（Grid Search）

网格搜索是通过穷举法搜索超参数空间的所有可能组合，找到最优的超参数配置。虽然这种方法计算开销较大，但可以确保找到全局最优解。

from sklearn.model_selection import GridSearchCV
import lightgbm as lgb

param_grid = {
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 200, 500],
    'num_leaves': [31, 61, 127],
    'max_depth': [-1, 10, 20]
}

gbm = lgb.LGBMClassifier()
grid_search = GridSearchCV(estimator=gbm, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print(f'Best parameters: {grid_search.best_params_}')
print(f'Best score: {grid_search.best_score_}')

2. 随机搜索（Random Search）

随机搜索是从超参数空间中随机采样一定数量的超参数组合进行评估，相比网格搜索，计算效率更高。

from sklearn.model_selection import RandomizedSearchCV
import lightgbm as lgb

param_dist = {
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 200, 500],
    'num_leaves': [31, 61, 127],
    'max_depth': [-1, 10, 20]
}

gbm = lgb.LGBMClassifier()
random_search = RandomizedSearchCV(estimator=gbm, param_distributions=param_dist, n_iter=100, cv=5, scoring='accuracy')
random_search.fit(X_train, y_train)

print(f'Best parameters: {random_search.best_params_}')
print(f'Best score: {random_search.best_score_}')

3. 贝叶斯优化（Bayesian Optimization）

贝叶斯优化是一种基于概率模型的全局优化方法，可以在较少的迭代次数内找到最优的超参数配置。

from bayes_opt import BayesianOptimization
import lightgbm as lgb
from sklearn.model_selection import cross_val_score

def lgb_eval(learning_rate, num_leaves, max_depth, n_estimators):
    params = {
        'learning_rate': learning_rate,
        'num_leaves': int(num_leaves),
        'max_depth': int(max_depth),
        'n_estimators': int(n_estimators),
        'objective': 'binary'
    }
    gbm = lgb.LGBMClassifier(**params)
    return cross_val_score(gbm, X_train, y_train, cv=5, scoring='accuracy').mean()

lgb_bo = BayesianOptimization(
    lgb_eval,
    {'learning_rate': (0.01, 0.2),
     'num_leaves': (20, 150),
     'max_depth': (-1, 50),
     'n_estimators': (50, 500)}
)

lgb_bo.maximize(init_points=5, n_iter=25)

4. 遗传算法（Genetic Algorithm）

遗传算法是一种基于自然选择和遗传机制的优化算法，可以用于优化超参数。

from tpot import TPOTClassifier

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2, config_dict='TPOT sparse')
tpot.fit(X_train, y_train)

print(tpot.fitted_pipeline_)

5. 超参数优化库（Hyperopt）

Hyperopt是一个用于超参数优化的库，基于树结构的Parzen估计（Tree-structured Parzen Estimator, TPE）算法。

from hyperopt import fmin, tpe, hp, Trials
import lightgbm as lgb
from sklearn.model_selection import cross_val_score

def objective(params):
    gbm = lgb.LGBMClassifier(**params)
    score = cross_val_score(gbm, X_train, y_train, cv=5, scoring='accuracy').mean()
    return -score

space = {
    'learning_rate': hp.uniform('learning_rate', 0.01, 0.2),
    'num_leaves': hp.quniform('num_leaves', 20, 150, 1),
    'max_depth': hp.quniform('max_depth', -1, 50, 1),
    'n_estimators': hp.quniform('n_estimators', 50, 500, 1)
}

trials = Trials()
best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=100, trials=trials)

print(best)

6. 梯度提升树调优器（Optuna）

Optuna是一种高效的超参数优化框架，使用TPE算法进行超参数搜索。

import optuna
import lightgbm as lgb
from sklearn.model_selection import cross_val_score

def objective(trial):
    param = {
        'objective': 'binary',
        'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 0.2),
        'num_leaves': trial.suggest_int('num_leaves', 20, 150),
        'max_depth': trial.suggest_int('max_depth', -1, 50),
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
    }
    gbm = lgb.LGBMClassifier(**param)
    score = cross_val_score(gbm, X_train, y_train, cv=5, scoring='accuracy').mean()
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

print(study.best_params)
print(study.best_value)