XGBoost预测时如果遇到缺失值(missing)默认往左子树。
XGBoost解决二分类问题代码(绘制CART回归树)
GBDT是能进行二分类的,因为我们要明确一点就是GBDT 每轮的训练是在上一轮的训练的残差基础之上进行训练的, 而这里的残差指的就是当前模型的负梯度值, 这个就要求每轮迭代的时候,弱分类器的输出的结果相减是有意义的, 而GBDT无论用于分类还是回归一直都是使用的CART 回归树,至于具体怎么做分类任务,可以看这两个回答:1,2
XGBoost解决二分类问题的代码并画出CART回归决策树
运行下面代码需要安装 graphviz
import xgboost as xgb
from xgboost import plot_tree
import matplotlib.pyplot as plt
import numpy as np
# 分类问题数据集
np.random.seed(1234)
train_x = np.random.rand(100,5)
train_y = np.random.randint(0,2,100)
test_x = np.random.rand(20,5)
test_y = np.random.randint(0,2,20)
dtrain = xgb.DMatrix(train_x, label = train_y)
dtest = xgb.DMatrix(test_x, label = test_y)
print('*' * 25, '开始模型训练', '*' * 25)
model = xgb.train(params={
'booster': 'gbtree',
'objective': 'binary:logistic',
'eval_metric': [ 'logloss','auc'],
'max_depth': 4},
dtrain=dtrain,
verbose_eval=True,
evals=[(dtrain, "train"), (dtest, "valid")],
early_stopping_rounds=10,
num_boost_round = 1000
)
print('*' * 25, '模型训练结束', '*' * 25)
print('*' * 25, '开始模型结构绘制', '*' * 25)
print("model.best_ntree_limit:", model.best_ntree_limit)
for i in range(model.best_ntree_limit):
# 画出第i棵CART回归树
plot_tree(model,num_trees=i)
# 保存图片
plt.savefig(f"tree{i}.png", dpi=900)
# plt.show()
print('*' * 25, '模型结构绘制结束', '*' * 25)
运行结果
************************* 开始模型训练 *************************
[0] train-logloss:0.63596 train-auc:0.74390 valid-logloss:0.71601 valid-auc:0.45312
[1] train-logloss:0.60108 train-auc:0.75217 valid-logloss:0.74161 valid-auc:0.48958
[2] train-logloss:0.55854 train-auc:0.86606 valid-logloss:0.74433 valid-auc:0.53125
[3] train-logloss:0.52843 train-auc:0.89004 valid-logloss:0.76378 valid-auc:0.53646
[4] train-logloss:0.50814 train-auc:0.91381 valid-logloss:0.77606 valid-auc:0.53125
[5] train-logloss:0.47967 train-auc:0.92476 valid-logloss:0.80379 valid-auc:0.46875
[6] train-logloss:0.46174 train-auc:0.93365 valid-logloss:0.79062 valid-auc:0.48438
[7] train-logloss:0.44558 train-auc:0.94729 valid-logloss:0.80327 valid-auc:0.47917
[1] train-logloss:0.60108 train-auc:0.75217 valid-logloss:0.74161 valid-auc:0.48958
[2] train-logloss:0.55854 train-auc:0.86606 valid-logloss:0.74433 valid-auc:0.53125
[3] train-logloss:0.52843 train-auc:0.89004 valid-logloss:0.76378 valid-auc:0.53646
[4] train-logloss:0.50814 train-auc:0.91381 valid-logloss:0.77606 valid-auc:0.53125
[5] train-logloss:0.47967 train-auc:0.92476 valid-logloss:0.80379 valid-auc:0.46875
[6] train-logloss:0.46174 train-auc:0.93365 valid-logloss:0.79062 valid-auc:0.48438
[7] train-logloss:0.44558 train-auc:0.94729 valid-logloss:0.80327 valid-auc:0.47917
[8] train-logloss:0.42358 train-auc:0.96693 valid-logloss:0.80089 valid-auc:0.50000
[9] train-logloss:0.41019 train-auc:0.97127 valid-logloss:0.81481 valid-auc:0.46875
[10] train-logloss:0.39895 train-auc:0.97127 valid-logloss:0.82640 valid-auc:0.45833
[11] train-logloss:0.38496 train-auc:0.97520 valid-logloss:0.84798 valid-auc:0.45833
[12] train-logloss:0.36452 train-auc:0.97933 valid-logloss:0.87036 valid-auc:0.41667
[13] train-logloss:0.35660 train-auc:0.97974 valid-logloss:0.87867 valid-auc:0.42708
************************* 模型训练结束 *************************
************************* 开始模型结构绘制 *************************
model.best_ntree_limit: 4
************************* 模型结构绘制结束 *************************