【lightgbm】Light Gradient Boosting Machine(LGBM)

1. sklearn api

lgbm参数和使用

'''
- categorical_feature
    指明哪些列是类别特征
    类别特征的值必须是从0开始的连续整数, 比如0,1,2,..., 不能是负数 (需提前编码)
	lightGBM比XGBoost的1个改进之处在于对类别特征的处理, 不再需要将类别特征转为one-hot形式

'''
clf = LGBMClassifier(n_estimators=150, 
                     categorical_feature=[1, 3, 6, 9, 13, 15, 19, 20, 33], 
                     metric='auc')
'''
eval_set
	 在模型每次迭代时查看进行验证的分数
early_stopping_rounds=50
	模型50个迭代内发现验证的分数没有增长就不再迭代
verbose=30
	每30个迭代显示一次分值
'''
clf.fit(X_train.values, y_train, eval_set=[(X_val, y_val)], verbose=False)
'''
输出特征重要性
	clf.feature_importances_ 输出特征数量长度的np.array数组,其中每个元素代表对应特征的重要性得分。
	clf.feature_importances_.argsort() 输出升序排列的np.array数组的索引
	clf.feature_importances_.argsort()[::-1] 降序
'''
for i in clf.feature_importances_.argsort()[::-1]:
	print(features[i], clf.feature_importances_[i]/clf.feature_importances_.sum())

2. 原生api

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import lightgbm as lgb
data = load_digits()
X = data.data
y = data.target
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.2, random_state=1107)
print(Xtrain.shape, Ytrain.shape)
'''
    (1437, 64) (1437,)
'''
traindata = lgb.Dataset(Xtrain, Ytrain)
# 训练
params = {
    'learning_rate': 0.1,
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': 'auc',
    'seed': 2222,
    'n_jobs': 4,
    'verbose': -1,
}
# 自动打印信息,默认建树100
# lgb.early_stopping(100) 如果验证集的误差在100次迭代内没有降低,则停止迭代.
# lgb.log_evaluation用于记录每一轮的训练结果,-1表示每一轮都记录。
reg = lgb.train(params, train, valid_sets=[val], num_boost_round=10000, 
                 callbacks=[lgb.early_stopping(100), lgb.log_evaluation(-1)])   
y_pre = reg.predict(Xtest)
from sklearn.metrics import mean_squared_error as MSE

MSE(Ytest, y_pre, squared=False) # RMSE
'''
    0.9434823962603885
'''
# 交叉验证
# seed控制树的生长和分支随机性
param = {'seed':1107, 'metric':'rmse', 'force_col_wise':True}
'''
stratified 默认Ture
    https://blog.csdn.net/qq_45249685/article/details/128690834
    针对分类算法,保证每折数据的标签类别比例与全数据一致
    lgb.cv()默认使用stratified进行交叉验证,对回归算法无意义。
seed 
    控制交叉验证随机性
'''
result = lgb.cv(param, traindata, nfold=5, num_boost_round=10, seed=1107, stratified=False)

'''
    [LightGBM] [Info] Total Bins 835
    [LightGBM] [Info] Number of data points in the train set: 1148, number of used features: 53
    [LightGBM] [Info] Total Bins 835
    [LightGBM] [Info] Number of data points in the train set: 1148, number of used features: 53
    [LightGBM] [Info] Total Bins 835
    [LightGBM] [Info] Number of data points in the train set: 1148, number of used features: 53
    [LightGBM] [Info] Total Bins 835
    [LightGBM] [Info] Number of data points in the train set: 1148, number of used features: 53
    [LightGBM] [Info] Total Bins 835
    [LightGBM] [Info] Number of data points in the train set: 1148, number of used features: 53
    [LightGBM] [Info] Start training from score 4.452091
    [LightGBM] [Info] Start training from score 4.554007
    [LightGBM] [Info] Start training from score 4.521777
    [LightGBM] [Info] Start training from score 4.527003
    [LightGBM] [Info] Start training from score 4.499129
'''    
import pandas as pd
'''
RMSE 根均方误差, STDV 标准差
10行 建立10棵树过程中,每次迭代完毕后进行5折交叉验证的RMSE和stdv的平均值
'''
pd.DataFrame(result)
rmse-meanrmse-stdv
02.6846640.088352
12.5225740.080620
22.3703670.069264
32.2455820.063250
42.1348500.064275
52.0341400.062627
61.9345870.062498
71.8463550.064433
81.7680700.063164
91.6958300.062633

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值