lightGBM算法

荼靡~

已于 2022-11-07 09:13:50 修改

阅读量1.3k

点赞数 1

分类专栏： # 机器学习笔记

于 2022-02-06 16:51:34 首次发布

本文链接：https://blog.csdn.net/m0_46926492/article/details/122799458

版权

机器学习笔记专栏收录该内容

31 篇文章 5 订阅

订阅专栏

lightGBM

微软出品
优点：对xgboost进行了优化
- 训练速度非常快
- 内存消耗非常低
- 准确率非常高
- 并发和支持GPU加速
- 能直接处理缺失值
- 能处理庞大体量的数据
fit参数
- eval_set:在模型每次迭代时查看进行验证的分数
- early_stopping_rounds=50:模型50个迭代内发现验证的分数没有增长就不再迭代
- verbose=30:每30个迭代显示一次分值
重要属性
- best_iteration_:在整个迭代中的最优迭代次数
- feature_importances_:返回特征重要性
- feature_names:特征名称
- num_iteration:迭代次数
重要模型参数
- subsample:抽取样本比例
- learning_rate:学习速率
- boosting_type:
  - gbdt: traditional Gradient Boosting Decision Tree.
  - ‘dart’:Dropouts meet Multiple Additive Regression Trees.
  - ‘rf’:Random Forest.
- n_estimators
其他参数

在这里插入图片描述

示例

import lightgbm as lgb
import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
cancer=load_breast_cancer()
feature=cancer.data
target=cancer.target
x_train,x_test,y_train,y_test=train_test_split(feature,target,random_state=2020)
lgb_model=lgb.LGBMClassifier(n_estimators=150)
lgb_model.fit(x_train,y_train)
y_pred=lgb_model.predict_proba(x_test)[:,1]
roc_auc_score(y_test,y_pred)  #0.9940023990403839

lgb_model=lgb.LGBMClassifier(n_estimator=150)
#eval_set在模型每次迭代时进行验证的分数查看
#early_stopping_rounds=50,模型50个迭代内发现验证的分数没有增长就不在迭代
#verbose=30,表示每30个迭代显示一次分值
#eval_set在模型每次迭代时进行验证的分数查看
lgb_model = lgb.LGBMClassifier(n_estimators=150)
lgb_model.fit(x_train,y_train,eval_set=[(x_test,y_test)],eval_metric='auc')
# [1]	valid_0's auc: 0.97451	valid_0's binary_logloss: 0.610631
# [2]	valid_0's auc: 0.977109	valid_0's binary_logloss: 0.545753
#...

#early_stopping_rounds=50模型50个迭代内发现验证分数没有增长就不再迭代了
lgb_model = lgb.LGBMClassifier(n_estimators=150)
lgb_model.fit(x_train,y_train,eval_set=[(x_test,y_test)],eval_metric='auc',early_stopping_rounds=50)
# #[1]	valid_0's auc: 0.97451	valid_0's binary_logloss: 0.610631
# Training until validation scores don't improve for 50 rounds
# [2]	valid_0's auc: 0.977109	valid_0's binary_logloss: 0.545753
# [3]	valid_0's auc: 0.978709	valid_0's binary_logloss: 0.488691
#...

#重要属性
#verbose=30,表示每30个迭代显示一次分值
lgb_model.fit(x_train,y_train,eval_set=[(x_test,y_test)],eval_metric='auc',early_stopping_rounds=50,verbose=30)
# Training until validation scores don't improve for 50 rounds
# [30]	valid_0's auc: 0.992003	valid_0's binary_logloss: 0.116233
# [60]	valid_0's auc: 0.989804	valid_0's binary_logloss: 0.0967597
# Early stopping, best iteration is:
# [37]	valid_0's auc: 0.992603	valid_0's binary_logloss: 0.0995741

#best_iteration_表示在整个迭代中的最优迭代次数
lgb_model.best_iteration_  #66
#返回特征重要性
lgb_model.feature_importances_
# array([ 24, 121,  29,  15,  25,  20,  26,  67,  18,  50,  16,  17,  10,
#         33,  31,  42,  24,  41,  41,  31,  51, 104,  64,  49,  35,  24,
#         43,  69,  42,  15], dtype=int32)
#特征名字
cancer.feature_names
# array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
#        'mean smoothness', 'mean compactness', 'mean concavity',
#        'mean concave points', 'mean symmetry', 'mean fractal dimension',
#        'radius error', 'texture error', 'perimeter error', 'area error',
#        'smoothness error', 'compactness error', 'concavity error',
#        'concave points error', 'symmetry error',
#        'fractal dimension error', 'worst radius', 'worst texture',
#        'worst perimeter', 'worst area', 'worst smoothness',
#        'worst compactness', 'worst concavity', 'worst concave points',
#        'worst symmetry', 'worst fractal dimension'], dtype='<U23')

#根据最优迭代次数进行预测
lgb_model.predict(x_test,num_iteration=lgb_model.best_iteration_)