xgboost优化
介绍 (Introduction)
Like many data scientists, XGBoost is now part of my toolkit. This algorithm is among the most popular in the world of data science (real-world or competition). Its multitasking aspect allows it to be used in regression or classification projects. It can be used on tabular, structured, and unstructured data.
像许多数据科学家一样,XGBoost现在已成为我工具包的一部分。 该算法是数据科学领域(实际或竞争)中最受欢迎的算法。 它的多任务处理功能使其可以用于回归或分类项目。 它可以用于表格,结构化和非结构化数据。
A notebook containing the code is available on GitHub. The notebook is intended to be used in the case of classification of documents (text).
包含代码的笔记本可以在GitHub上找到。 笔记本打算用于文档(文本)分类的情况。
XGBoost (XGBoost)
XGBoost or eXtreme Gradient Boosting is a based-tree algorithm (Chen and Guestrin, 2016[2]). XGBoost is part of the tree family (Decision tree, Random Forest, bagging, boosting, gradient boosting).
XGBoost或eXtreme Gradient Boosting是一种基于树的算法(Chen和Guestrin,2016 [2])。 XGBoost是树家族的一部分(决策树,随机森林,装袋,增强,梯度增强)。
Boosting is an ensemble method with the primary objective of reducing bias and variance. The goal is to create weak trees sequentially so that each new tree (or learner) focuses on the weakness (misclassified data) of the previous one. After a weak learner is added, the data weights are readjusted, known as “re-weighting”. The whole for