集成学习
1.集成学习方法概述
集成学习有三种学习方法:Bagging、Boosting(最常用)、Stacking(模型的堆叠)
1.Bagging
随机森林(Bagging的扩展变体)
随机森林的优点:
1.在数据集上表现良好,相对于其他算法有较大优势;
2.易于并行化,在数据集上有很大的优势;
3.能够处理高纬度数据,不用作特征选择,不需要做数据的规范化。
很多弱分类器通过集成学习成为强分类器。
2.Boosting
AdaBoost算法
自适应增强:
算法思想:
算法思想的图例表示:
(红点应该变小,权重降低,此处忽略)
GBDT算法
GBDT(Gradient Boosting Decision Tree)是一种迭代的决策树算法。该算法由多棵决策树组成,GBDT的核心在于累加所有树的结果作为最终结果,所以GBDT中的树都是回归树(累加所以值是连续值,所以是回归树),属于Boosting策略,GBDT被公认为泛化能力非常强。
GBDT由三个概念组成:Regression Decision Tree (DT)回归树、Gradient Boosting(GB)梯度提升、Sheinkage缩减。
例子:
BGDT算法思想:损失函数的负梯度在当前模型的值作为提升树的残差的近似值来拟合回归树。
XGBoost
非常常用的一种boosting方法,是目前最快最好的开源boosting tree工具包。
LightGBM
优点:训练速度更快,内存占用更低,准确率更好一点,分布式支持,可快速处理海量数据。
主要改进:LightGBM=XGBoost+GOSS+EFB+Histogram
1.基于梯度的单边采样算法(GOSS):
主要思想:通过对样本采样的方法减少计算目标函数增益时候的复杂度。GOSS算法保留了梯度大的样本,并对梯度小的样本进行随机抽样,为了不改变样本的数据分布,在计算增益时为梯度小的样本乘以一个常数进行平衡。如果一个样本梯度很小,说明该样本的训练误差很小,或者说该样本已经得到了很好的训练。
应用举例:
2.互斥特征捆绑算法(EFB):
高维特征间可能相互排斥(如两特征间不同时取非零值)或者不完全排斥,可以将特征融合绑定,从而降低特征数量。
3.直方图算法:
基本思想:将连续的特征离散化为k个离散特征,同时构造一个宽度为k的直方图(含k个bin)。无需遍历数据,只需遍历k个bin即可找到最佳分裂点。
4.基于最大深度的Leaf-wise的垂直生长算法:
3.Stacking
2.集成学习代码
import warnings
warnings.filterwarnings("ignore") #忽略莫名其妙出现的不影响程序运行的警告
import pandas as pd
from sklearn.model_selection import train_test_split
# 生成12000行的数据,训练集和测试集按照3:1划分
from sklearn.datasets import make_hastie_10_2
data, target = make_hastie_10_2()
X_train, X_test, y_train, y_test = train_test_split(data, target, random_state=123)
print(X_train.shape, X_test.shape)
# ((9000, 10), (3000, 10)) 10个特征
六大模型对比
对比六大模型,都使用默认参数,测一下交叉验证的分数。
from sklearn.linear_model import LogisticRegression # 线性逻辑回归
from sklearn.ensemble import RandomForestClassifier # 随机森林
from sklearn.ensemble import AdaBoostClassifier # AdaBoost
from sklearn.ensemble import GradientBoostingClassifier # GBDT
from xgboost import XGBClassifier #XGBoost
from lightgbm import LGBMClassifier # LightGBM
from sklearn.model_selection import cross_val_score
import time
#分类器clf
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = AdaBoostClassifier()
clf4 = GradientBoostingClassifier()
clf5 = XGBClassifier()
clf6 = LGBMClassifier()
for clf, label in zip([clf1, clf2, clf3, clf4, clf5, clf6], ['Logistic Regression', 'Random Forest', 'AdaBoost', 'GBDT', 'XGBoost','LightGBM']):
start = time.time() #开始时间
scores = cross_val_score(clf, X_train, y_train,scoring='accuracy', cv=5)
end = time.time() #结束时间
running_time = end - start
print("Accuracy: %0.8f (+/- %0.2f),耗时%0.2f秒。模型名称[%s]" %(scores.mean(), scores.std(), running_time, label))
Accuracy: 0.49411111 (+/- 0.01),耗时0.06秒。模型名称[Logistic Regression]
Accuracy: 0.88533333 (+/- 0.01),耗时13.73秒。模型名称[Random Forest]
Accuracy: 0.87533333 (+/- 0.01),耗时2.79秒。模型名称[AdaBoost]
Accuracy: 0.91122222 (+/- 0.00),耗时9.22秒。模型名称[GBDT]
[15:30:36] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[15:30:37] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[15:30:38] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[15:30:39] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[15:30:40] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Accuracy: 0.92366667 (+/- 0.00),耗时5.75秒。模型名称[XGBoost]
Accuracy: 0.92800000 (+/- 0.00),耗时0.60秒。模型名称[LightGBM]
对比六大模型,可以看出,逻辑回归速度最快,但准确率最低;而LightGBM,速度快,而且准确率最高,所以,现在处理结构化数据的时候,大部分都是用LightGBM算法。
XGBoost的使用
1.原生XGBoost的使用
import xgboost as xgb
#记录程序运行时间
import time
start_time = time.time()
#xgb矩阵赋值
xgb_train = xgb.DMatrix(X_train, y_train)
xgb_test = xgb.DMatrix(X_test, label=y_test)
##参数
params = {
'booster': 'gbtree',
# 'silent': 1, #设置成1则没有运行信息输出,最好是设置为0.
#'nthread':7,# cpu 线程数 默认最大
'eta': 0.007, # 如同学习率
'min_child_weight': 3,
# 这个参数默认是 1,是每个叶子里面 h 的和至少是多少,对正负样本不均衡时的 0-1 分类而言,
#假设 h 在 0.01 附近,min_child_weight 为 1 意味着叶子节点中最少需要包含 100 个样本。
#这个参数非常影响结果,控制叶子节点中二阶导的和的最小值,该参数值越小,越容易 overfitting。
'max_depth': 6, # 构建树的深度,越大越容易过拟合
'gamma': 0.1, # 树的叶子节点上作进一步分区所需的最小损失减少,越大越保守,一般0.1、0.2这样子。
'subsample': 0.7, # 随机采样训练样本
'colsample_bytree': 0.7, # 生成树时进行的列采样
'lambda': 2, # 控制模型复杂度的权重值的L2正则化项参数,参数越大,模型越不容易过拟合。
#'alpha':0, # L1正则项参数
#'scale_pos_weight':1, #如果取值大于0的话,在类别样本不平衡的情况下有助于快速收敛。
#'objective': 'multi:softmax', #多分类的问题
#'num_class':10, # 类别数,多分类与 multisoftmax 并用
'seed': 1000, #随机种子
#'eval_metric': 'auc'
}
plst = list(params.items()) # 字典转为列表
num_rounds = 500 # 迭代次数
watchlist = [(xgb_train, 'train'), (xgb_test, 'val')] # 显示的结果
#训练模型并保存
# early_stopping_rounds 当设置的迭代次数较大时,early_stopping_rounds 可在一定的迭代次数内准确率没有提升就停止训练
model = xgb.train(plst,xgb_train,num_rounds,watchlist,early_stopping_rounds=100)
#model.save_model('./model/xgb.model') # 用于存储训练出的模型
print("best best_ntree_limit", model.best_ntree_limit)
y_pred = model.predict(xgb_test,ntree_limit=model.best_ntree_limit)
print('error=%f' %(sum(1 for i in range(len(y_pred)) if int(y_pred[i] > 0.5) != y_test[i]) / float(len(y_pred))))
# 输出运行时长
cost_time = time.time() - start_time
print("xgboost success!", '\n', "cost time:", cost_time, "(s)......")
[0] train-rmse:1.11000 val-rmse:1.10422
[1] train-rmse:1.10734 val-rmse:1.10182
[2] train-rmse:1.10465 val-rmse:1.09932
[3] train-rmse:1.10207 val-rmse:1.09694
[4] train-rmse:1.09944 val-rmse:1.09453
[5] train-rmse:1.09682 val-rmse:1.09211
[6] train-rmse:1.09424 val-rmse:1.08975
[7] train-rmse:1.09175 val-rmse:1.08745
[8] train-rmse:1.08923 val-rmse:1.08511
[9] train-rmse:1.08664 val-rmse:1.08275
[10] train-rmse:1.08410 val-rmse:1.08039
[11] train-rmse:1.08167 val-rmse:1.07811
[12] train-rmse:1.07922 val-rmse:1.07581
[13] train-rmse:1.07675 val-rmse:1.07353
[14] train-rmse:1.07434 val-rmse:1.07129
[15] train-rmse:1.07190 val-rmse:1.06903
[16] train-rmse:1.06943 val-rmse:1.06677
[17] train-rmse:1.06713 val-rmse:1.06467
[18] train-rmse:1.06476 val-rmse:1.06252
[19] train-rmse:1.06244 val-rmse:1.06040
[20] train-rmse:1.06015 val-rmse:1.05831
[21] train-rmse:1.05791 val-rmse:1.05630
[22] train-rmse:1.05558 val-rmse:1.05421
[23] train-rmse:1.05328 val-rmse:1.05215
[24] train-rmse:1.05102 val-rmse:1.05011
[25] train-rmse:1.04873 val-rmse:1.04802
[26] train-rmse:1.04649 val-rmse:1.04600
[27] train-rmse:1.04429 val-rmse:1.04398
[28] train-rmse:1.04214 val-rmse:1.04205
[29] train-rmse:1.03999 val-rmse:1.04009
[30] train-rmse:1.03787 val-rmse:1.03816
[31] train-rmse:1.03570 val-rmse:1.03616
[32] train-rmse:1.03362 val-rmse:1.03429
[33] train-rmse:1.03153 val-rmse:1.03241
[34] train-rmse:1.02947 val-rmse:1.03053
[35] train-rmse:1.02745 val-rmse:1.02868
[36] train-rmse:1.02537 val-rmse:1.02678
[37] train-rmse:1.02329 val-rmse:1.02480
[38] train-rmse:1.02124 val-rmse:1.02294
[39] train-rmse:1.01930 val-rmse:1.02125
[40] train-rmse:1.01727 val-rmse:1.01946
[41] train-rmse:1.01531 val-rmse:1.01769
[42] train-rmse:1.01328 val-rmse:1.01584
[43] train-rmse:1.01131 val-rmse:1.01403
[44] train-rmse:1.00931 val-rmse:1.01216
[45] train-rmse:1.00737 val-rmse:1.01039
[46] train-rmse:1.00548 val-rmse:1.00869
[47] train-rmse:1.00349 val-rmse:1.00691
[48] train-rmse:1.00159 val-rmse:1.00517
[49] train-rmse:0.99967 val-rmse:1.00345
[50] train-rmse:0.99774 val-rmse:1.00165
[51] train-rmse:0.99584 val-rmse:0.99994
[52] train-rmse:0.99397 val-rmse:0.99818
[53] train-rmse:0.99209 val-rmse:0.99649
[54] train-rmse:0.99020 val-rmse:0.99482
[55] train-rmse:0.98833 val-rmse:0.99308
[56] train-rmse:0.98649 val-rmse:0.99140
[57] train-rmse:0.98468 val-rmse:0.98977
[58] train-rmse:0.98293 val-rmse:0.98820
[59] train-rmse:0.98113 val-rmse:0.98655
[60] train-rmse:0.97933 val-rmse:0.98492
[61] train-rmse:0.97754 val-rmse:0.98337
[62] train-rmse:0.97566 val-rmse:0.98167
[63] train-rmse:0.97399 val-rmse:0.98020
[64] train-rmse:0.97229 val-rmse:0.97869
[65] train-rmse:0.97051 val-rmse:0.97715
[66] train-rmse:0.96875 val-rmse:0.97555
[67] train-rmse:0.96706 val-rmse:0.97396
[68] train-rmse:0.96543 val-rmse:0.97249
[69] train-rmse:0.96373 val-rmse:0.97092
[70] train-rmse:0.96207 val-rmse:0.96938
[71] train-rmse:0.96044 val-rmse:0.96796
[72] train-rmse:0.95881 val-rmse:0.96649
[73] train-rmse:0.95715 val-rmse:0.96497
[74] train-rmse:0.95554 val-rmse:0.96354
[75] train-rmse:0.95389 val-rmse:0.96215
[76] train-rmse:0.95216 val-rmse:0.96062
[77] train-rmse:0.95056 val-rmse:0.95913
[78] train-rmse:0.94890 val-rmse:0.95758
[79] train-rmse:0.94729 val-rmse:0.95609
[80] train-rmse:0.94567 val-rmse:0.95468
[81] train-rmse:0.94416 val-rmse:0.95334
[82] train-rmse:0.94261 val-rmse:0.95194
[83] train-rmse:0.94103 val-rmse:0.95055
[84] train-rmse:0.93942 val-rmse:0.94913
[85] train-rmse:0.93790 val-rmse:0.94772
[86] train-rmse:0.93637 val-rmse:0.94639
[87] train-rmse:0.93485 val-rmse:0.94504
[88] train-rmse:0.93331 val-rmse:0.94367
[89] train-rmse:0.93183 val-rmse:0.94238
[90] train-rmse:0.93032 val-rmse:0.94103
[91] train-rmse:0.92883 val-rmse:0.93970
[92] train-rmse:0.92737 val-rmse:0.93836
[93] train-rmse:0.92584 val-rmse:0.93707
[94] train-rmse:0.92442 val-rmse:0.93583
[95] train-rmse:0.92303 val-rmse:0.93458
[96] train-rmse:0.92167 val-rmse:0.93333
[97] train-rmse:0.92022 val-rmse:0.93210
[98] train-rmse:0.91876 val-rmse:0.93081
[99] train-rmse:0.91732 val-rmse:0.92943
[100] train-rmse:0.91587 val-rmse:0.92822
[101] train-rmse:0.91452 val-rmse:0.92695
[102] train-rmse:0.91312 val-rmse:0.92576
[103] train-rmse:0.91172 val-rmse:0.92450
[104] train-rmse:0.91039 val-rmse:0.92334
[105] train-rmse:0.90900 val-rmse:0.92216
[106] train-rmse:0.90758 val-rmse:0.92102
[107] train-rmse:0.90620 val-rmse:0.91982
[108] train-rmse:0.90483 val-rmse:0.91866
[109] train-rmse:0.90349 val-rmse:0.91743
[110] train-rmse:0.90210 val-rmse:0.91619
[111] train-rmse:0.90079 val-rmse:0.91500
[112] train-rmse:0.89943 val-rmse:0.91385
[113] train-rmse:0.89816 val-rmse:0.91276
[114] train-rmse:0.89691 val-rmse:0.91167
[115] train-rmse:0.89564 val-rmse:0.91063
[116] train-rmse:0.89438 val-rmse:0.90948
[117] train-rmse:0.89305 val-rmse:0.90836
[118] train-rmse:0.89179 val-rmse:0.90727
[119] train-rmse:0.89047 val-rmse:0.90606
[120] train-rmse:0.88915 val-rmse:0.90497
[121] train-rmse:0.88791 val-rmse:0.90391
[122] train-rmse:0.88666 val-rmse:0.90275
[123] train-rmse:0.88538 val-rmse:0.90158
[124] train-rmse:0.88409 val-rmse:0.90045
[125] train-rmse:0.88287 val-rmse:0.89940
[126] train-rmse:0.88154 val-rmse:0.89830
[127] train-rmse:0.88038 val-rmse:0.89720
[128] train-rmse:0.87913 val-rmse:0.89609
[129] train-rmse:0.87789 val-rmse:0.89501
[130] train-rmse:0.87664 val-rmse:0.89395
[131] train-rmse:0.87543 val-rmse:0.89289
[132] train-rmse:0.87419 val-rmse:0.89177
[133] train-rmse:0.87299 val-rmse:0.89080
[134] train-rmse:0.87177 val-rmse:0.88981
[135] train-rmse:0.87063 val-rmse:0.88880
[136] train-rmse:0.86944 val-rmse:0.88777
[137] train-rmse:0.86829 val-rmse:0.88667
[138] train-rmse:0.86718 val-rmse:0.88571
[139] train-rmse:0.86601 val-rmse:0.88461
[140] train-rmse:0.86483 val-rmse:0.88355
[141] train-rmse:0.86364 val-rmse:0.88260
[142] train-rmse:0.86244 val-rmse:0.88166
[143] train-rmse:0.86135 val-rmse:0.88068
[144] train-rmse:0.86024 val-rmse:0.87967
[145] train-rmse:0.85906 val-rmse:0.87862
[146] train-rmse:0.85793 val-rmse:0.87754
[147] train-rmse:0.85677 val-rmse:0.87655
[148] train-rmse:0.85570 val-rmse:0.87570
[149] train-rmse:0.85458 val-rmse:0.87474
[150] train-rmse:0.85347 val-rmse:0.87378
[151] train-rmse:0.85242 val-rmse:0.87292
[152] train-rmse:0.85134 val-rmse:0.87200
[153] train-rmse:0.85023 val-rmse:0.87108
[154] train-rmse:0.84911 val-rmse:0.87012
[155] train-rmse:0.84802 val-rmse:0.86925
[156] train-rmse:0.84700 val-rmse:0.86835
[157] train-rmse:0.84590 val-rmse:0.86743
[158] train-rmse:0.84482 val-rmse:0.86651
[159] train-rmse:0.84374 val-rmse:0.86555
[160] train-rmse:0.84267 val-rmse:0.86459
[161] train-rmse:0.84166 val-rmse:0.86370
[162] train-rmse:0.84059 val-rmse:0.86280
[163] train-rmse:0.83950 val-rmse:0.86185
[164] train-rmse:0.83843 val-rmse:0.86096
[165] train-rmse:0.83736 val-rmse:0.86002
[166] train-rmse:0.83640 val-rmse:0.85921
[167] train-rmse:0.83533 val-rmse:0.85831
[168] train-rmse:0.83429 val-rmse:0.85740
[169] train-rmse:0.83318 val-rmse:0.85650
[170] train-rmse:0.83215 val-rmse:0.85553
[171] train-rmse:0.83112 val-rmse:0.85465
[172] train-rmse:0.83010 val-rmse:0.85378
[173] train-rmse:0.82908 val-rmse:0.85298
[174] train-rmse:0.82806 val-rmse:0.85203
[175] train-rmse:0.82705 val-rmse:0.85117
[176] train-rmse:0.82604 val-rmse:0.85034
[177] train-rmse:0.82509 val-rmse:0.84950
[178] train-rmse:0.82406 val-rmse:0.84869
[179] train-rmse:0.82307 val-rmse:0.84776
[180] train-rmse:0.82202 val-rmse:0.84692
[181] train-rmse:0.82106 val-rmse:0.84610
[182] train-rmse:0.82008 val-rmse:0.84532
[183] train-rmse:0.81906 val-rmse:0.84441
[184] train-rmse:0.81812 val-rmse:0.84361
[185] train-rmse:0.81716 val-rmse:0.84278
[186] train-rmse:0.81622 val-rmse:0.84202
[187] train-rmse:0.81528 val-rmse:0.84120
[188] train-rmse:0.81436 val-rmse:0.84043
[189] train-rmse:0.81345 val-rmse:0.83962
[190] train-rmse:0.81251 val-rmse:0.83876
[191] train-rmse:0.81163 val-rmse:0.83807
[192] train-rmse:0.81071 val-rmse:0.83732
[193] train-rmse:0.80981 val-rmse:0.83657
[194] train-rmse:0.80887 val-rmse:0.83574
[195] train-rmse:0.80799 val-rmse:0.83500
[196] train-rmse:0.80707 val-rmse:0.83423
[197] train-rmse:0.80613 val-rmse:0.83340
[198] train-rmse:0.80524 val-rmse:0.83263
[199] train-rmse:0.80434 val-rmse:0.83187
[200] train-rmse:0.80341 val-rmse:0.83110
[201] train-rmse:0.80253 val-rmse:0.83037
[202] train-rmse:0.80169 val-rmse:0.82971
[203] train-rmse:0.80081 val-rmse:0.82901
[204] train-rmse:0.79989 val-rmse:0.82820
[205] train-rmse:0.79902 val-rmse:0.82740
[206] train-rmse:0.79810 val-rmse:0.82662
[207] train-rmse:0.79720 val-rmse:0.82591
[208] train-rmse:0.79630 val-rmse:0.82514
[209] train-rmse:0.79539 val-rmse:0.82439
[210] train-rmse:0.79449 val-rmse:0.82371
[211] train-rmse:0.79358 val-rmse:0.82297
[212] train-rmse:0.79266 val-rmse:0.82221
[213] train-rmse:0.79181 val-rmse:0.82154
[214] train-rmse:0.79094 val-rmse:0.82078
[215] train-rmse:0.79004 val-rmse:0.82007
[216] train-rmse:0.78916 val-rmse:0.81934
[217] train-rmse:0.78831 val-rmse:0.81866
[218] train-rmse:0.78744 val-rmse:0.81797
[219] train-rmse:0.78657 val-rmse:0.81720
[220] train-rmse:0.78569 val-rmse:0.81650
[221] train-rmse:0.78481 val-rmse:0.81576
[222] train-rmse:0.78401 val-rmse:0.81510
[223] train-rmse:0.78317 val-rmse:0.81442
[224] train-rmse:0.78234 val-rmse:0.81369
[225] train-rmse:0.78151 val-rmse:0.81299
[226] train-rmse:0.78065 val-rmse:0.81228
[227] train-rmse:0.77982 val-rmse:0.81157
[228] train-rmse:0.77894 val-rmse:0.81082
[229] train-rmse:0.77807 val-rmse:0.81016
[230] train-rmse:0.77723 val-rmse:0.80948
[231] train-rmse:0.77640 val-rmse:0.80883
[232] train-rmse:0.77556 val-rmse:0.80811
[233] train-rmse:0.77478 val-rmse:0.80753
[234] train-rmse:0.77391 val-rmse:0.80681
[235] train-rmse:0.77306 val-rmse:0.80612
[236] train-rmse:0.77220 val-rmse:0.80543
[237] train-rmse:0.77139 val-rmse:0.80475
[238] train-rmse:0.77065 val-rmse:0.80412
[239] train-rmse:0.76984 val-rmse:0.80342
[240] train-rmse:0.76899 val-rmse:0.80278
[241] train-rmse:0.76826 val-rmse:0.80216
[242] train-rmse:0.76748 val-rmse:0.80156
[243] train-rmse:0.76669 val-rmse:0.80094
[244] train-rmse:0.76589 val-rmse:0.80029
[245] train-rmse:0.76510 val-rmse:0.79968
[246] train-rmse:0.76430 val-rmse:0.79907
[247] train-rmse:0.76350 val-rmse:0.79839
[248] train-rmse:0.76277 val-rmse:0.79776
[249] train-rmse:0.76205 val-rmse:0.79716
[250] train-rmse:0.76125 val-rmse:0.79652
[251] train-rmse:0.76044 val-rmse:0.79594
[252] train-rmse:0.75961 val-rmse:0.79526
[253] train-rmse:0.75883 val-rmse:0.79455
[254] train-rmse:0.75804 val-rmse:0.79391
[255] train-rmse:0.75733 val-rmse:0.79328
[256] train-rmse:0.75655 val-rmse:0.79261
[257] train-rmse:0.75579 val-rmse:0.79197
[258] train-rmse:0.75506 val-rmse:0.79140
[259] train-rmse:0.75429 val-rmse:0.79079
[260] train-rmse:0.75358 val-rmse:0.79017
[261] train-rmse:0.75279 val-rmse:0.78952
[262] train-rmse:0.75203 val-rmse:0.78888
[263] train-rmse:0.75121 val-rmse:0.78819
[264] train-rmse:0.75048 val-rmse:0.78750
[265] train-rmse:0.74974 val-rmse:0.78687
[266] train-rmse:0.74903 val-rmse:0.78629
[267] train-rmse:0.74825 val-rmse:0.78565
[268] train-rmse:0.74748 val-rmse:0.78506
[269] train-rmse:0.74678 val-rmse:0.78448
[270] train-rmse:0.74609 val-rmse:0.78392
[271] train-rmse:0.74541 val-rmse:0.78334
[272] train-rmse:0.74472 val-rmse:0.78274
[273] train-rmse:0.74404 val-rmse:0.78212
[274] train-rmse:0.74327 val-rmse:0.78156
[275] train-rmse:0.74258 val-rmse:0.78095
[276] train-rmse:0.74189 val-rmse:0.78042
[277] train-rmse:0.74118 val-rmse:0.77985
[278] train-rmse:0.74044 val-rmse:0.77924
[279] train-rmse:0.73974 val-rmse:0.77862
[280] train-rmse:0.73906 val-rmse:0.77810
[281] train-rmse:0.73836 val-rmse:0.77758
[282] train-rmse:0.73765 val-rmse:0.77707
[283] train-rmse:0.73695 val-rmse:0.77647
[284] train-rmse:0.73624 val-rmse:0.77587
[285] train-rmse:0.73555 val-rmse:0.77527
[286] train-rmse:0.73486 val-rmse:0.77468
[287] train-rmse:0.73419 val-rmse:0.77419
[288] train-rmse:0.73352 val-rmse:0.77366
[289] train-rmse:0.73280 val-rmse:0.77304
[290] train-rmse:0.73211 val-rmse:0.77246
[291] train-rmse:0.73140 val-rmse:0.77185
[292] train-rmse:0.73073 val-rmse:0.77128
[293] train-rmse:0.72999 val-rmse:0.77067
[294] train-rmse:0.72931 val-rmse:0.77017
[295] train-rmse:0.72864 val-rmse:0.76965
[296] train-rmse:0.72794 val-rmse:0.76912
[297] train-rmse:0.72727 val-rmse:0.76858
[298] train-rmse:0.72660 val-rmse:0.76802
[299] train-rmse:0.72588 val-rmse:0.76744
[300] train-rmse:0.72522 val-rmse:0.76698
[301] train-rmse:0.72461 val-rmse:0.76649
[302] train-rmse:0.72391 val-rmse:0.76595
[303] train-rmse:0.72326 val-rmse:0.76543
[304] train-rmse:0.72259 val-rmse:0.76487
[305] train-rmse:0.72195 val-rmse:0.76435
[306] train-rmse:0.72132 val-rmse:0.76377
[307] train-rmse:0.72066 val-rmse:0.76324
[308] train-rmse:0.72004 val-rmse:0.76272
[309] train-rmse:0.71935 val-rmse:0.76219
[310] train-rmse:0.71868 val-rmse:0.76165
[311] train-rmse:0.71806 val-rmse:0.76111
[312] train-rmse:0.71741 val-rmse:0.76065
[313] train-rmse:0.71674 val-rmse:0.76006
[314] train-rmse:0.71608 val-rmse:0.75959
[315] train-rmse:0.71547 val-rmse:0.75906
[316] train-rmse:0.71478 val-rmse:0.75852
[317] train-rmse:0.71414 val-rmse:0.75798
[318] train-rmse:0.71352 val-rmse:0.75745
[319] train-rmse:0.71287 val-rmse:0.75697
[320] train-rmse:0.71226 val-rmse:0.75643
[321] train-rmse:0.71162 val-rmse:0.75597
[322] train-rmse:0.71096 val-rmse:0.75548
[323] train-rmse:0.71036 val-rmse:0.75501
[324] train-rmse:0.70976 val-rmse:0.75457
[325] train-rmse:0.70910 val-rmse:0.75402
[326] train-rmse:0.70848 val-rmse:0.75350
[327] train-rmse:0.70789 val-rmse:0.75308
[328] train-rmse:0.70721 val-rmse:0.75264
[329] train-rmse:0.70661 val-rmse:0.75216
[330] train-rmse:0.70597 val-rmse:0.75162
[331] train-rmse:0.70536 val-rmse:0.75114
[332] train-rmse:0.70476 val-rmse:0.75069
[333] train-rmse:0.70416 val-rmse:0.75022
[334] train-rmse:0.70352 val-rmse:0.74974
[335] train-rmse:0.70289 val-rmse:0.74921
[336] train-rmse:0.70229 val-rmse:0.74870
[337] train-rmse:0.70173 val-rmse:0.74823
[338] train-rmse:0.70112 val-rmse:0.74774
[339] train-rmse:0.70053 val-rmse:0.74730
[340] train-rmse:0.69992 val-rmse:0.74682
[341] train-rmse:0.69928 val-rmse:0.74625
[342] train-rmse:0.69869 val-rmse:0.74576
[343] train-rmse:0.69812 val-rmse:0.74532
[344] train-rmse:0.69751 val-rmse:0.74484
[345] train-rmse:0.69691 val-rmse:0.74435
[346] train-rmse:0.69631 val-rmse:0.74392
[347] train-rmse:0.69571 val-rmse:0.74345
[348] train-rmse:0.69513 val-rmse:0.74295
[349] train-rmse:0.69460 val-rmse:0.74253
[350] train-rmse:0.69400 val-rmse:0.74205
[351] train-rmse:0.69339 val-rmse:0.74158
[352] train-rmse:0.69281 val-rmse:0.74109
[353] train-rmse:0.69223 val-rmse:0.74060
[354] train-rmse:0.69158 val-rmse:0.74009
[355] train-rmse:0.69102 val-rmse:0.73968
[356] train-rmse:0.69046 val-rmse:0.73923
[357] train-rmse:0.68992 val-rmse:0.73881
[358] train-rmse:0.68933 val-rmse:0.73831
[359] train-rmse:0.68873 val-rmse:0.73781
[360] train-rmse:0.68812 val-rmse:0.73739
[361] train-rmse:0.68752 val-rmse:0.73687
[362] train-rmse:0.68697 val-rmse:0.73640
[363] train-rmse:0.68644 val-rmse:0.73595
[364] train-rmse:0.68587 val-rmse:0.73549
[365] train-rmse:0.68528 val-rmse:0.73510
[366] train-rmse:0.68469 val-rmse:0.73460
[367] train-rmse:0.68413 val-rmse:0.73410
[368] train-rmse:0.68362 val-rmse:0.73371
[369] train-rmse:0.68303 val-rmse:0.73321
[370] train-rmse:0.68246 val-rmse:0.73276
[371] train-rmse:0.68188 val-rmse:0.73234
[372] train-rmse:0.68133 val-rmse:0.73191
[373] train-rmse:0.68082 val-rmse:0.73148
[374] train-rmse:0.68026 val-rmse:0.73110
[375] train-rmse:0.67970 val-rmse:0.73072
[376] train-rmse:0.67908 val-rmse:0.73022
[377] train-rmse:0.67856 val-rmse:0.72977
[378] train-rmse:0.67799 val-rmse:0.72935
[379] train-rmse:0.67742 val-rmse:0.72894
[380] train-rmse:0.67687 val-rmse:0.72850
[381] train-rmse:0.67631 val-rmse:0.72807
[382] train-rmse:0.67577 val-rmse:0.72763
[383] train-rmse:0.67523 val-rmse:0.72717
[384] train-rmse:0.67475 val-rmse:0.72677
[385] train-rmse:0.67423 val-rmse:0.72637
[386] train-rmse:0.67368 val-rmse:0.72598
[387] train-rmse:0.67312 val-rmse:0.72557
[388] train-rmse:0.67265 val-rmse:0.72520
[389] train-rmse:0.67212 val-rmse:0.72478
[390] train-rmse:0.67160 val-rmse:0.72435
[391] train-rmse:0.67107 val-rmse:0.72397
[392] train-rmse:0.67054 val-rmse:0.72352
[393] train-rmse:0.67000 val-rmse:0.72310
[394] train-rmse:0.66944 val-rmse:0.72270
[395] train-rmse:0.66893 val-rmse:0.72232
[396] train-rmse:0.66842 val-rmse:0.72194
[397] train-rmse:0.66787 val-rmse:0.72160
[398] train-rmse:0.66731 val-rmse:0.72117
[399] train-rmse:0.66681 val-rmse:0.72068
[400] train-rmse:0.66629 val-rmse:0.72026
[401] train-rmse:0.66578 val-rmse:0.71988
[402] train-rmse:0.66529 val-rmse:0.71948
[403] train-rmse:0.66479 val-rmse:0.71908
[404] train-rmse:0.66428 val-rmse:0.71869
[405] train-rmse:0.66381 val-rmse:0.71831
[406] train-rmse:0.66330 val-rmse:0.71789
[407] train-rmse:0.66278 val-rmse:0.71750
[408] train-rmse:0.66230 val-rmse:0.71715
[409] train-rmse:0.66180 val-rmse:0.71676
[410] train-rmse:0.66126 val-rmse:0.71635
[411] train-rmse:0.66074 val-rmse:0.71596
[412] train-rmse:0.66029 val-rmse:0.71567
[413] train-rmse:0.65978 val-rmse:0.71532
[414] train-rmse:0.65927 val-rmse:0.71490
[415] train-rmse:0.65876 val-rmse:0.71450
[416] train-rmse:0.65825 val-rmse:0.71415
[417] train-rmse:0.65780 val-rmse:0.71382
[418] train-rmse:0.65730 val-rmse:0.71341
[419] train-rmse:0.65680 val-rmse:0.71302
[420] train-rmse:0.65630 val-rmse:0.71267
[421] train-rmse:0.65581 val-rmse:0.71230
[422] train-rmse:0.65533 val-rmse:0.71196
[423] train-rmse:0.65483 val-rmse:0.71158
[424] train-rmse:0.65438 val-rmse:0.71127
[425] train-rmse:0.65392 val-rmse:0.71088
[426] train-rmse:0.65342 val-rmse:0.71052
[427] train-rmse:0.65295 val-rmse:0.71014
[428] train-rmse:0.65251 val-rmse:0.70984
[429] train-rmse:0.65206 val-rmse:0.70955
[430] train-rmse:0.65159 val-rmse:0.70922
[431] train-rmse:0.65111 val-rmse:0.70883
[432] train-rmse:0.65062 val-rmse:0.70844
[433] train-rmse:0.65017 val-rmse:0.70810
[434] train-rmse:0.64969 val-rmse:0.70771
[435] train-rmse:0.64922 val-rmse:0.70731
[436] train-rmse:0.64873 val-rmse:0.70700
[437] train-rmse:0.64824 val-rmse:0.70668
[438] train-rmse:0.64777 val-rmse:0.70635
[439] train-rmse:0.64724 val-rmse:0.70601
[440] train-rmse:0.64678 val-rmse:0.70562
[441] train-rmse:0.64631 val-rmse:0.70523
[442] train-rmse:0.64583 val-rmse:0.70490
[443] train-rmse:0.64535 val-rmse:0.70455
[444] train-rmse:0.64487 val-rmse:0.70419
[445] train-rmse:0.64441 val-rmse:0.70384
[446] train-rmse:0.64396 val-rmse:0.70349
[447] train-rmse:0.64350 val-rmse:0.70313
[448] train-rmse:0.64300 val-rmse:0.70278
[449] train-rmse:0.64254 val-rmse:0.70250
[450] train-rmse:0.64208 val-rmse:0.70216
[451] train-rmse:0.64157 val-rmse:0.70178
[452] train-rmse:0.64109 val-rmse:0.70144
[453] train-rmse:0.64065 val-rmse:0.70109
[454] train-rmse:0.64020 val-rmse:0.70072
[455] train-rmse:0.63973 val-rmse:0.70036
[456] train-rmse:0.63926 val-rmse:0.69998
[457] train-rmse:0.63884 val-rmse:0.69967
[458] train-rmse:0.63840 val-rmse:0.69938
[459] train-rmse:0.63792 val-rmse:0.69903
[460] train-rmse:0.63743 val-rmse:0.69866
[461] train-rmse:0.63695 val-rmse:0.69831
[462] train-rmse:0.63655 val-rmse:0.69796
[463] train-rmse:0.63613 val-rmse:0.69760
[464] train-rmse:0.63567 val-rmse:0.69726
[465] train-rmse:0.63523 val-rmse:0.69695
[466] train-rmse:0.63475 val-rmse:0.69664
[467] train-rmse:0.63430 val-rmse:0.69625
[468] train-rmse:0.63389 val-rmse:0.69597
[469] train-rmse:0.63342 val-rmse:0.69563
[470] train-rmse:0.63299 val-rmse:0.69538
[471] train-rmse:0.63254 val-rmse:0.69505
[472] train-rmse:0.63207 val-rmse:0.69470
[473] train-rmse:0.63164 val-rmse:0.69443
[474] train-rmse:0.63121 val-rmse:0.69414
[475] train-rmse:0.63082 val-rmse:0.69383
[476] train-rmse:0.63035 val-rmse:0.69350
[477] train-rmse:0.62995 val-rmse:0.69321
[478] train-rmse:0.62950 val-rmse:0.69284
[479] train-rmse:0.62902 val-rmse:0.69252
[480] train-rmse:0.62865 val-rmse:0.69225
[481] train-rmse:0.62821 val-rmse:0.69197
[482] train-rmse:0.62776 val-rmse:0.69165
[483] train-rmse:0.62734 val-rmse:0.69125
[484] train-rmse:0.62691 val-rmse:0.69097
[485] train-rmse:0.62651 val-rmse:0.69066
[486] train-rmse:0.62608 val-rmse:0.69035
[487] train-rmse:0.62563 val-rmse:0.69003
[488] train-rmse:0.62519 val-rmse:0.68971
[489] train-rmse:0.62474 val-rmse:0.68936
[490] train-rmse:0.62429 val-rmse:0.68902
[491] train-rmse:0.62389 val-rmse:0.68866
[492] train-rmse:0.62348 val-rmse:0.68834
[493] train-rmse:0.62301 val-rmse:0.68802
[494] train-rmse:0.62263 val-rmse:0.68772
[495] train-rmse:0.62219 val-rmse:0.68738
[496] train-rmse:0.62175 val-rmse:0.68707
[497] train-rmse:0.62135 val-rmse:0.68680
[498] train-rmse:0.62096 val-rmse:0.68650
[499] train-rmse:0.62056 val-rmse:0.68624
best best_ntree_limit 500
error=0.837333
xgboost success!
cost time: 7.657426118850708 (s)......
2.使用scikit-learn接口
会改变的函数名是:
eta -> learning_rate
lambda -> reg_lambda
alpha -> reg_alpha
from sklearn.model_selection import train_test_split
from sklearn import metrics
from xgboost import XGBClassifier
clf = XGBClassifier(
# silent=0, #设置成1则没有运行信息输出,最好是设置为0.是否在运行升级时打印消息。
#nthread=4,# cpu 线程数 默认最大
learning_rate=0.3, # 如同学习率
min_child_weight=1,
# 这个参数默认是 1,是每个叶子里面 h 的和至少是多少,对正负样本不均衡时的 0-1 分类而言
#假设 h 在 0.01 附近,min_child_weight 为 1 意味着叶子节点中最少需要包含 100 个样本。
#这个参数非常影响结果,控制叶子节点中二阶导的和的最小值,该参数值越小,越容易 overfitting。
max_depth=6, # 构建树的深度,越大越容易过拟合
gamma=0, # 树的叶子节点上作进一步分区所需的最小损失减少,越大越保守,一般0.1、0.2这样子。
subsample=1, # 随机采样训练样本 训练实例的子采样比
max_delta_step=0, #最大增量步长,我们允许每个树的权重估计。
colsample_bytree=1, # 生成树时进行的列采样
reg_lambda=1, # 控制模型复杂度的权重值的L2正则化项参数,参数越大,模型越不容易过拟合。
#reg_alpha=0, # L1 正则项参数
#scale_pos_weight=1, #如果取值大于0的话,在类别样本不平衡的情况下有助于快速收敛。平衡正负权重
#objective= 'multi:softmax', #多分类的问题 指定学习任务和相应的学习目标
#num_class=10, # 类别数,多分类与 multisoftmax 并用
n_estimators=100, #树的个数
seed=1000 #随机种子
#eval_metric= 'auc'
)
clf.fit(X_train, y_train)
y_true, y_pred = y_test, clf.predict(X_test)
print("Accuracy : %.4g" % metrics.accuracy_score(y_true, y_pred))
[16:03:06] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Accuracy : 0.9273
LIghtGBM的使用
1.原生接口
import lightgbm as lgb
from sklearn.metrics import mean_squared_error
# 加载你的数据
# print('Load data...')
# df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
# df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
#
# y_train = df_train[0].values
# y_test = df_test[0].values
# X_train = df_train.drop(0, axis=1).values
# X_test = df_test.drop(0, axis=1).values
# 创建成lgb特征的数据集格式
lgb_train = lgb.Dataset(X_train, y_train) # 将数据保存到LightGBM二进制文件将使加载更快
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train) # 创建验证数据
# 将参数写成字典下形式
params = {
'task': 'train',
'boosting_type': 'gbdt', # 设置提升类型
'objective': 'regression', # 目标函数
'metric': {'l2', 'auc'}, # 评估函数
'num_leaves': 31, # 叶子节点数
'learning_rate': 0.05, # 学习速率
'feature_fraction': 0.9, # 建树的特征选择比例
'bagging_fraction': 0.8, # 建树的样本采样比例
'bagging_freq': 5, # k 意味着每 k 次迭代执行bagging
'verbose': 1 # <0 显示致命的, =0 显示错误 (警告), >0 显示信息
}
print('Start training...')
# 训练 cv and train
gbm = lgb.train(params, lgb_train,num_boost_round=500,valid_sets=lgb_eval,early_stopping_rounds=5) # 训练数据需要参数列表和数据集
print('Save model...')
gbm.save_model('model.txt') # 训练后保存模型到文件
print('Start predicting...')
# 预测数据集
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) #如果在训练期间启用了早期停止,可以通过best_iteration方式从最佳迭代中获得预测
# 评估模型
print('error=%f' %
(sum(1
for i in range(len(y_pred)) if int(y_pred[i] > 0.5) != y_test[i]) /
float(len(y_pred))))
Start training...
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000448 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2550
[LightGBM] [Info] Number of data points in the train set: 9000, number of used features: 10
[LightGBM] [Info] Start training from score 0.012000
[1] valid_0's auc: 0.814399 valid_0's l2: 0.965563
Training until validation scores don't improve for 5 rounds
[2] valid_0's auc: 0.84729 valid_0's l2: 0.934647
[3] valid_0's auc: 0.872805 valid_0's l2: 0.905265
[4] valid_0's auc: 0.884117 valid_0's l2: 0.877875
[5] valid_0's auc: 0.895115 valid_0's l2: 0.852189
[6] valid_0's auc: 0.905545 valid_0's l2: 0.826298
[7] valid_0's auc: 0.909113 valid_0's l2: 0.803776
[8] valid_0's auc: 0.913303 valid_0's l2: 0.781627
[9] valid_0's auc: 0.917894 valid_0's l2: 0.760624
[10] valid_0's auc: 0.919443 valid_0's l2: 0.742882
[11] valid_0's auc: 0.921543 valid_0's l2: 0.723811
[12] valid_0's auc: 0.923021 valid_0's l2: 0.707255
[13] valid_0's auc: 0.9257 valid_0's l2: 0.69078
[14] valid_0's auc: 0.928892 valid_0's l2: 0.675987
[15] valid_0's auc: 0.930132 valid_0's l2: 0.661313
[16] valid_0's auc: 0.931587 valid_0's l2: 0.646023
[17] valid_0's auc: 0.932941 valid_0's l2: 0.634004
[18] valid_0's auc: 0.934165 valid_0's l2: 0.622429
[19] valid_0's auc: 0.935885 valid_0's l2: 0.610132
[20] valid_0's auc: 0.936883 valid_0's l2: 0.599122
[21] valid_0's auc: 0.93814 valid_0's l2: 0.589571
[22] valid_0's auc: 0.940452 valid_0's l2: 0.580309
[23] valid_0's auc: 0.941039 valid_0's l2: 0.571361
[24] valid_0's auc: 0.943049 valid_0's l2: 0.562062
[25] valid_0's auc: 0.9446 valid_0's l2: 0.551967
[26] valid_0's auc: 0.946498 valid_0's l2: 0.543442
[27] valid_0's auc: 0.94763 valid_0's l2: 0.535659
[28] valid_0's auc: 0.94871 valid_0's l2: 0.527913
[29] valid_0's auc: 0.949753 valid_0's l2: 0.521228
[30] valid_0's auc: 0.950816 valid_0's l2: 0.513909
[31] valid_0's auc: 0.95184 valid_0's l2: 0.507784
[32] valid_0's auc: 0.953109 valid_0's l2: 0.501336
[33] valid_0's auc: 0.954351 valid_0's l2: 0.494439
[34] valid_0's auc: 0.955716 valid_0's l2: 0.488722
[35] valid_0's auc: 0.956098 valid_0's l2: 0.483373
[36] valid_0's auc: 0.956495 valid_0's l2: 0.477602
[37] valid_0's auc: 0.956717 valid_0's l2: 0.473033
[38] valid_0's auc: 0.957213 valid_0's l2: 0.468013
[39] valid_0's auc: 0.957812 valid_0's l2: 0.463634
[40] valid_0's auc: 0.957862 valid_0's l2: 0.459433
[41] valid_0's auc: 0.958249 valid_0's l2: 0.455687
[42] valid_0's auc: 0.958799 valid_0's l2: 0.450696
[43] valid_0's auc: 0.959311 valid_0's l2: 0.446838
[44] valid_0's auc: 0.959835 valid_0's l2: 0.44233
[45] valid_0's auc: 0.960234 valid_0's l2: 0.438117
[46] valid_0's auc: 0.960826 valid_0's l2: 0.43469
[47] valid_0's auc: 0.961647 valid_0's l2: 0.430488
[48] valid_0's auc: 0.962359 valid_0's l2: 0.427449
[49] valid_0's auc: 0.962506 valid_0's l2: 0.424433
[50] valid_0's auc: 0.962897 valid_0's l2: 0.420571
[51] valid_0's auc: 0.963657 valid_0's l2: 0.417288
[52] valid_0's auc: 0.964224 valid_0's l2: 0.414743
[53] valid_0's auc: 0.964903 valid_0's l2: 0.412255
[54] valid_0's auc: 0.965508 valid_0's l2: 0.40907
[55] valid_0's auc: 0.966194 valid_0's l2: 0.406477
[56] valid_0's auc: 0.966759 valid_0's l2: 0.403771
[57] valid_0's auc: 0.966901 valid_0's l2: 0.400885
[58] valid_0's auc: 0.967291 valid_0's l2: 0.398386
[59] valid_0's auc: 0.967779 valid_0's l2: 0.395949
[60] valid_0's auc: 0.968119 valid_0's l2: 0.393905
[61] valid_0's auc: 0.968517 valid_0's l2: 0.391743
[62] valid_0's auc: 0.968891 valid_0's l2: 0.389717
[63] valid_0's auc: 0.969304 valid_0's l2: 0.387769
[64] valid_0's auc: 0.969598 valid_0's l2: 0.385498
[65] valid_0's auc: 0.969953 valid_0's l2: 0.383139
[66] valid_0's auc: 0.970443 valid_0's l2: 0.38094
[67] valid_0's auc: 0.970888 valid_0's l2: 0.378793
[68] valid_0's auc: 0.971189 valid_0's l2: 0.376754
[69] valid_0's auc: 0.971377 valid_0's l2: 0.37495
[70] valid_0's auc: 0.971692 valid_0's l2: 0.37324
[71] valid_0's auc: 0.971954 valid_0's l2: 0.371629
[72] valid_0's auc: 0.972278 valid_0's l2: 0.370046
[73] valid_0's auc: 0.972622 valid_0's l2: 0.368577
[74] valid_0's auc: 0.972986 valid_0's l2: 0.366746
[75] valid_0's auc: 0.973308 valid_0's l2: 0.365326
[76] valid_0's auc: 0.973449 valid_0's l2: 0.364078
[77] valid_0's auc: 0.973681 valid_0's l2: 0.362431
[78] valid_0's auc: 0.973941 valid_0's l2: 0.361071
[79] valid_0's auc: 0.97428 valid_0's l2: 0.359825
[80] valid_0's auc: 0.974554 valid_0's l2: 0.358506
[81] valid_0's auc: 0.974731 valid_0's l2: 0.357538
[82] valid_0's auc: 0.975094 valid_0's l2: 0.355998
[83] valid_0's auc: 0.97531 valid_0's l2: 0.354819
[84] valid_0's auc: 0.975363 valid_0's l2: 0.353645
[85] valid_0's auc: 0.9756 valid_0's l2: 0.352575
[86] valid_0's auc: 0.975688 valid_0's l2: 0.351995
[87] valid_0's auc: 0.975909 valid_0's l2: 0.350867
[88] valid_0's auc: 0.97603 valid_0's l2: 0.350146
[89] valid_0's auc: 0.976171 valid_0's l2: 0.34933
[90] valid_0's auc: 0.976264 valid_0's l2: 0.348303
[91] valid_0's auc: 0.976501 valid_0's l2: 0.347415
[92] valid_0's auc: 0.976681 valid_0's l2: 0.346621
[93] valid_0's auc: 0.976794 valid_0's l2: 0.345989
[94] valid_0's auc: 0.976892 valid_0's l2: 0.345124
[95] valid_0's auc: 0.977077 valid_0's l2: 0.344425
[96] valid_0's auc: 0.9771 valid_0's l2: 0.343969
[97] valid_0's auc: 0.977176 valid_0's l2: 0.343221
[98] valid_0's auc: 0.977239 valid_0's l2: 0.342578
[99] valid_0's auc: 0.977433 valid_0's l2: 0.341817
[100] valid_0's auc: 0.977516 valid_0's l2: 0.341229
[101] valid_0's auc: 0.977559 valid_0's l2: 0.340357
[102] valid_0's auc: 0.977707 valid_0's l2: 0.339484
[103] valid_0's auc: 0.977742 valid_0's l2: 0.339004
[104] valid_0's auc: 0.977806 valid_0's l2: 0.338581
[105] valid_0's auc: 0.977983 valid_0's l2: 0.338095
[106] valid_0's auc: 0.978113 valid_0's l2: 0.337505
[107] valid_0's auc: 0.978251 valid_0's l2: 0.336939
[108] valid_0's auc: 0.978479 valid_0's l2: 0.336443
[109] valid_0's auc: 0.978611 valid_0's l2: 0.336062
[110] valid_0's auc: 0.978694 valid_0's l2: 0.335636
[111] valid_0's auc: 0.97885 valid_0's l2: 0.335083
[112] valid_0's auc: 0.979037 valid_0's l2: 0.334435
[113] valid_0's auc: 0.979209 valid_0's l2: 0.333876
[114] valid_0's auc: 0.97939 valid_0's l2: 0.333341
[115] valid_0's auc: 0.979513 valid_0's l2: 0.332968
[116] valid_0's auc: 0.979615 valid_0's l2: 0.332583
[117] valid_0's auc: 0.979741 valid_0's l2: 0.332138
[118] valid_0's auc: 0.979883 valid_0's l2: 0.331546
[119] valid_0's auc: 0.979971 valid_0's l2: 0.331399
[120] valid_0's auc: 0.980002 valid_0's l2: 0.331036
[121] valid_0's auc: 0.980098 valid_0's l2: 0.330674
[122] valid_0's auc: 0.980204 valid_0's l2: 0.330228
[123] valid_0's auc: 0.980204 valid_0's l2: 0.330131
[124] valid_0's auc: 0.980271 valid_0's l2: 0.329895
[125] valid_0's auc: 0.980441 valid_0's l2: 0.329194
[126] valid_0's auc: 0.980456 valid_0's l2: 0.328811
[127] valid_0's auc: 0.980472 valid_0's l2: 0.328493
[128] valid_0's auc: 0.980519 valid_0's l2: 0.328459
[129] valid_0's auc: 0.980578 valid_0's l2: 0.32832
[130] valid_0's auc: 0.980635 valid_0's l2: 0.328198
[131] valid_0's auc: 0.980771 valid_0's l2: 0.327791
[132] valid_0's auc: 0.980872 valid_0's l2: 0.327462
[133] valid_0's auc: 0.980884 valid_0's l2: 0.327269
[134] valid_0's auc: 0.980951 valid_0's l2: 0.327037
[135] valid_0's auc: 0.980989 valid_0's l2: 0.326838
[136] valid_0's auc: 0.981031 valid_0's l2: 0.32665
[137] valid_0's auc: 0.981025 valid_0's l2: 0.326543
[138] valid_0's auc: 0.981099 valid_0's l2: 0.326342
[139] valid_0's auc: 0.981079 valid_0's l2: 0.326256
[140] valid_0's auc: 0.981083 valid_0's l2: 0.326143
[141] valid_0's auc: 0.981149 valid_0's l2: 0.32578
[142] valid_0's auc: 0.981222 valid_0's l2: 0.325428
[143] valid_0's auc: 0.98131 valid_0's l2: 0.325142
[144] valid_0's auc: 0.981348 valid_0's l2: 0.324963
[145] valid_0's auc: 0.981395 valid_0's l2: 0.324856
[146] valid_0's auc: 0.981461 valid_0's l2: 0.324682
[147] valid_0's auc: 0.981544 valid_0's l2: 0.324538
[148] valid_0's auc: 0.981605 valid_0's l2: 0.324309
[149] valid_0's auc: 0.981641 valid_0's l2: 0.324249
[150] valid_0's auc: 0.981707 valid_0's l2: 0.324083
[151] valid_0's auc: 0.981747 valid_0's l2: 0.323942
[152] valid_0's auc: 0.981823 valid_0's l2: 0.323728
[153] valid_0's auc: 0.981888 valid_0's l2: 0.323549
[154] valid_0's auc: 0.981936 valid_0's l2: 0.323444
[155] valid_0's auc: 0.982036 valid_0's l2: 0.323267
[156] valid_0's auc: 0.982064 valid_0's l2: 0.323081
[157] valid_0's auc: 0.982064 valid_0's l2: 0.323087
[158] valid_0's auc: 0.982105 valid_0's l2: 0.322942
[159] valid_0's auc: 0.982106 valid_0's l2: 0.322876
[160] valid_0's auc: 0.982114 valid_0's l2: 0.322758
[161] valid_0's auc: 0.982147 valid_0's l2: 0.322571
[162] valid_0's auc: 0.982193 valid_0's l2: 0.322484
[163] valid_0's auc: 0.982217 valid_0's l2: 0.322336
[164] valid_0's auc: 0.982271 valid_0's l2: 0.322134
[165] valid_0's auc: 0.982268 valid_0's l2: 0.322089
[166] valid_0's auc: 0.982285 valid_0's l2: 0.322113
[167] valid_0's auc: 0.982278 valid_0's l2: 0.322163
[168] valid_0's auc: 0.982341 valid_0's l2: 0.322001
[169] valid_0's auc: 0.982368 valid_0's l2: 0.322046
[170] valid_0's auc: 0.982379 valid_0's l2: 0.321915
[171] valid_0's auc: 0.982344 valid_0's l2: 0.32184
[172] valid_0's auc: 0.982431 valid_0's l2: 0.32163
[173] valid_0's auc: 0.982449 valid_0's l2: 0.321532
[174] valid_0's auc: 0.982469 valid_0's l2: 0.321488
[175] valid_0's auc: 0.982556 valid_0's l2: 0.321291
[176] valid_0's auc: 0.982616 valid_0's l2: 0.320934
[177] valid_0's auc: 0.982631 valid_0's l2: 0.320862
[178] valid_0's auc: 0.982641 valid_0's l2: 0.32074
[179] valid_0's auc: 0.982714 valid_0's l2: 0.320641
[180] valid_0's auc: 0.982727 valid_0's l2: 0.320571
[181] valid_0's auc: 0.982733 valid_0's l2: 0.320354
[182] valid_0's auc: 0.982752 valid_0's l2: 0.32015
[183] valid_0's auc: 0.982776 valid_0's l2: 0.320041
[184] valid_0's auc: 0.982755 valid_0's l2: 0.320013
[185] valid_0's auc: 0.982758 valid_0's l2: 0.319983
[186] valid_0's auc: 0.98273 valid_0's l2: 0.320012
[187] valid_0's auc: 0.98274 valid_0's l2: 0.319916
[188] valid_0's auc: 0.982794 valid_0's l2: 0.319746
[189] valid_0's auc: 0.982785 valid_0's l2: 0.31972
[190] valid_0's auc: 0.982773 valid_0's l2: 0.319747
[191] valid_0's auc: 0.982783 valid_0's l2: 0.319851
[192] valid_0's auc: 0.982751 valid_0's l2: 0.319971
[193] valid_0's auc: 0.982685 valid_0's l2: 0.320043
Early stopping, best iteration is:
[188] valid_0's auc: 0.982794 valid_0's l2: 0.319746
Save model...
Start predicting...
error=0.664000
2.scikit-learn接口
from sklearn import metrics
from lightgbm import LGBMClassifier
clf = LGBMClassifier(
boosting_type='gbdt', # 提升树的类型 gbdt,dart,goss,rf
num_leaves=31, #树的最大叶子数,对比xgboost一般为2^(max_depth)
max_depth=-1, #最大树的深度
learning_rate=0.1, #学习率
n_estimators=100, # 拟合的树的棵树,相当于训练轮数
subsample_for_bin=200000,
objective=None,
class_weight=None,
min_split_gain=0.0, # 最小分割增益
min_child_weight=0.001, # 分支结点的最小权重
min_child_samples=20,
subsample=1.0, # 训练样本采样率 行
subsample_freq=0, # 子样本频率
colsample_bytree=1.0, # 训练特征采样率 列
reg_alpha=0.0, # L1正则化系数
reg_lambda=0.0, # L2正则化系数
random_state=None,
n_jobs=-1,
silent=True,
)
clf.fit(X_train, y_train, eval_metric='auc')
#设置验证集合 verbose=False不打印过程
# clf.fit(X_train, y_train)
y_true, y_pred = y_test, clf.predict(X_test)
print("Accuracy : %.4g" % metrics.accuracy_score(y_true, y_pred))
Accuracy : 0.9347
eval_metric
是评价函数,对模型的训练没有影响,而是在模型训练完成之后评估模型效果。如我们经常使用logloss作为objective,经常与之搭配的评价函数是auc、acc等。
评估标准。使用方法: eval_metric = 'error'
回归任务(默认rmse)
rmse--均方根误差
mae--平均绝对误差
分类任务(默认error)
auc--roc曲线下面积
error--错误率(二分类)
merror--错误率(多分类)
logloss--负对数似然函数(二分类)
mlogloss--负对数似然函数(多分类)
map--平均正确率
————————————————
版权声明:最后的eval_metric
部分为CSDN博主「缘 源 园」的原创文章,原文链接:https://blog.csdn.net/weixin_48135624/article/details/115173785