跟我读论文系列之LightGBM

最新推荐文章于 2023-03-10 21:37:12 发布

好饭不怕晚_

最新推荐文章于 2023-03-10 21:37:12 发布

阅读量513

点赞数

分类专栏：机器学习文章标签：机器学习决策树人工智能

本文链接：https://blog.csdn.net/meiyh3/article/details/127156720

版权

文章目录

理论

TITLE 标题

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

LightGBM: 一种高效的梯度提升决策树

AUTHOR 作者

Guolin Ke, Qi Meng, etc

ABSTRACT 摘要

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT

梯度提升决策树(GBDT)是一种流行的机器学习算法，有许多有效的实现，如XGBoost和pGBRT

Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large.

虽然在这些实现中采用了许多工程优化，但在特征维高、数据量大的情况下，效率和可扩展性仍然不理想。

A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming

一个主要原因是，对于每个特征，他们需要扫描所有的数据实例来估计所有可能的分裂点的信息增益，这是非常耗时的

To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB).

为了解决这个问题，我们提出了两种新的技术：Gradient-based单边抽样 和 互斥特征捆绑

With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size.

《减少样本量》在GOSS中，我们排除了相当一部分具有小梯度的数据实例，只使用剩下的数据实例来估计信息增益。我们证明，由于梯度较大的数据实例在信息增益的计算中起着更重要的作用，GOSS可以在更小的数据量下获得相当准确的信息增益估计。

With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much)

《减少特征量》使用EFB，我们将相互排斥的特征捆绑在一起(例如，它们很少同时接受非零值)，以减少特征的数量。我们证明了寻找排他特征的最佳捆绑是np困难的，但贪心算法可以获得相当好的近似比(因此可以有效地减少特征的数量，而不会大大损害分裂点确定的准确性)。

We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

Introduction 介绍

In recent years, with the emergence of big data (in terms of both the number of features and the number of instances), GBDT is facing new challenges, especially in the tradeoff between accuracy and efficiency. Conventional implementations of GBDT need to, for every feature, scan all the data instances to estimate the information gain of all the possible split points. Therefore, their computational complexities will be proportional to both the number of features and the number of instances. This makes these implementations very time consuming when handling big data.

近年来，随着大数据的出现(在特征数量和实例数量方面)，GBDT面临着新的挑战，特别是在准确性和效率之间的权衡。对于每个特性，GBDT的传统实现需要扫描所有数据实例，以估计所有可能分裂点的信息增益。因此，它们的计算复杂度将与特征的数量和实例的数量成正比。这使得这些实现在处理大数据时非常耗时。

To tackle this challenge, a straightforward idea is to reduce the number of data instances and the number of features. However, this turns out to be highly non-trivial. For example, it is unclear how to perform data sampling for GBDT.

为了应对这一挑战，一个简单的想法是减少数据实例的数量和特征的数量。然而，这是非常不容易的。例如，不清楚如何对GBDT执行数据采样。

In this paper, we propose two novel techniques towards this goal, as elaborated below.

Gradient-based One-Side Sampling (GOSS). While there is no native weight for data instance in GBDT, we notice that data instances with different gradients play different roles in the computation of information gain. In particular, according to the definition of information gain, those instances with larger gradients (i.e., under-trained instances) will contribute more to the information gain. Therefore, when down sampling the data instances, in order to retain the accuracy of information gain estimation, we should better keep those instances with large gradients (e.g., larger than a pre-defined threshold, or among the top