LightGBM源码学习

最新推荐文章于 2024-05-28 09:48:18 发布

数学工具构造器

最新推荐文章于 2024-05-28 09:48:18 发布

阅读量1k

点赞数 3

分类专栏： C++ 文章标签： c++

本文链接：https://blog.csdn.net/TQCAI666/article/details/120808789

版权

知乎 - LightGBM 源码剖析

知乎 - LightGBM源码阅读（一）

知乎 - LightGBM源码阅读（二）

CSDN - LightGBM源码阅读+理论分析（处理特征类别，缺省值的实现细节）

简述FastDBT和LightGBM中GBDT的实现

文章目录

调试准备
从main到GBDT::Train
- 执行路径
重点看一下TrainOneIter
EFB

调试准备

把-O3改为-O0，不然指令优化之后有些值调试不出来

代码里面有些带#pragma omp parallel for 的预处理语句可以注释掉，并行改串行，不然调试不进去

在这里插入图片描述
先以二分类为例，数据用的是官方给的例子

从main到GBDT::Train

执行路径

main函数执行路径

Application::Train
src/application/application.cpp:200

在这里插入图片描述

boosting_->Train(config_.snapshot_freq, config_.output_model);

去看boosting

include/LightGBM/boosting.h

boosting是一个抽象类，派生出boosting_type对应的4个子类
在这里插入图片描述

有空画个继承关系图

在这里插入图片描述
找到GBDT::Train函数

src/boosting/gbdt.cpp:246

fun_timer应该是一个RAII对象

Common::FunctionTimer fun_timer("GBDT::Train", global_timer);

朴实无华的迭代过程，看TrainOneIter

在这里插入图片描述

注意上层调用方式是is_finished = TrainOneIter(nullptr, nullptr);，gradient和hessian都是空指针

GBDT有两个成员变量，维护每次迭代时的梯度和黑塞（用了自己写的内存分配器）

vector里面放智能指针比放指针靠谱（对比sLSM源码）

  /*! \brief First order derivative of training data */
  std::vector<score_t, Common::AlignmentAllocator<score_t, kAlignedSize>> gradients_;
  /*! \brief Secend order derivative of training data */
  std::vector<score_t, Common::AlignmentAllocator<score_t, kAlignedSize>> hessians_;
  /*! \brief Trained models(trees) */
  std::vector<std::unique_ptr<Tree>> models_;
  /*! \brief Tree learner, will use this class to learn trees */
  std::unique_ptr<TreeLearner> tree_learner_;
  /*! \brief Objective function */
  const ObjectiveFunction* objective_function_;
  /*! \brief Pointer to training data */
  const Dataset* train_data_;

重点看一下TrainOneIter

GBDT::BoostFromAverage

重点看一下初始值怎么设的

BinaryLogLoss

在这里插入图片描述

double init_score = ObtainAutomaticInitialScore(objective_function_, class_id);

马上打印出：

[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.530877 -> initscore=0.123666

去ObtainAutomaticInitialScore看看

跳到src/objective/binary_objective.hpp:134

如图，总共7000全量样本，正样本3716，pavg=0.53

在这里插入图片描述

$initscore=\log(\frac{pavg}{1-pavg})$

算出来的是logits

ObjectiveFunction::GetGradients

最低0.47元/天解锁文章

数学工具构造器

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
LightGBM源码学习

知乎 - LightGBM 源码剖析文章目录从main到GBDT::Train执行路径重点看一下TrainOneIter从main到GBDT::Train执行路径main函数执行路径Application::Trainsrc/application/application.cpp:200boosting_->Train(config_.snapshot_freq, config_.output_model);去看boostinginclude/LightGBM/boosting
复制链接

扫一扫