12天summer----高级算法梳理-XGBoost算法梳理

最新推荐文章于 2022-01-04 19:30:03 发布

beautiful_well

最新推荐文章于 2022-01-04 19:30:03 发布

阅读量127

点赞数

分类专栏： DataWhale-高级算法梳理文章标签： XGBoost

本文链接：https://blog.csdn.net/beautiful_well/article/details/99595364

版权

DataWhale-高级算法梳理专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一、集成算法思想

在决策树中，我们知道一个样本往左边分或者往右边分，最终到达叶子结点，这样来进行一个分类任务。其实也可以做回归任务。

https://xgboost.readthedocs.io/en/latest/parameter.html 官网

我们通常在做分类或者回归任务的时候，需要想一想一旦选择用一个分类器可能表达效果并不是很好，那么就要考虑用这样一个集成的思想。上面的图例只是举了两个分类器，其实还可以有更多更复杂的弱分类器，一起组合成一个强分类器。

二、XGBoost基本思想

参考:https://blog.csdn.net/huacha__/article/details/81029680

三、XGBoost案例

使用鸢尾花数据集进行处理

import xgboost
from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
  
# 把数据集拆分成训练集和测试集
seed = 2019 #种子数
test_size = 0.30 #分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=seed)
 
# 拟合XGBoost模型
model = XGBClassifier()
model.fit(X_train, y_train)
 
# 对测试集做预测
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
 
# 评估预测结果
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

用xgboost模型对特征重要性进行排序

通过内建的绘制函数进行特征重要性得分排序后的绘制，这个函数就是plot_importance()

from xgboost import plot_importance
from matplotlib import pyplot
plot_importance(model)
pyplot.show()

参考https://blog.csdn.net/waitingzby/article/details/81610495

通过测试多个阈值，来从特征重要性中选择特征。具体而言，每个输入变量的特征重要性，本质上允许我们通过重要性来测试每个特征子集。模型的性能通常随着所选择的特征的数量而减少。

from sklearn.feature_selection import SelectFromModel
thresholds = np.sort(model.feature_importances_)
for thresh in thresholds:
	# select features using threshold
	selection = SelectFromModel(model, threshold=thresh, prefit=True)
	select_X_train = selection.transform(X_train)
	# train model
	selection_model = XGBClassifier()
	selection_model.fit(select_X_train, y_train)
	# eval model
	select_X_test = selection.transform(X_test)
	y_pred = selection_model.predict(select_X_test)
	predictions = [round(value) for value in y_pred]
	accuracy = accuracy_score(y_test, predictions)
	print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (thresh, select_X_train.shape[1], accuracy*100.0))

beautiful_well

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
12天summer----高级算法梳理-XGBoost算法梳理

一、集成算法思想在决策树中，我们知道一个样本往左边分或者往右边分，最终到达叶子结点，这样来进行一个分类任务。其实也可以做回归任务。https://xgboost.readthedocs.io/en/latest/parameter.html官网我们通常在做分类或者回归任务的时候，需要想一想一旦选择用一个分类器可能表达效果并不是很好，那么就要考虑用这样一个集成的思想。上面的图...
复制链接

扫一扫

专栏目录