LightGBM介绍:
GBDT (Gradient Boosting Decision Tree) 是机器学习中一个长盛不衰的模型,其主要思想是利用弱分类器(决策树)迭代训练以得到最优模型,该模型具有训练效果好、不易过拟合等优点。GBDT 在工业界应用广泛,通常被用于点击率预测,搜索排序等任务。GBDT 也是各种数据挖掘竞赛的致命武器,据统计 Kaggle 上的比赛有一半以上的冠军方案都是基于 GBDT。
LightGBM (Light Gradient Boosting Machine)是一个实现 GBDT 算法的框架,支持高效率的并行训练,并且具有以下优点:
更快的训练速度
更低的内存消耗
更好的准确率
分布式支持,可以快速处理海量数据
如下图,在 Higgs 数据集上 LightGBM 比 XGBoost 快将近 10 倍,内存占用率大约为 XGBoost 的1/6,并且准确率也有提升
在MAC上实际pip安装过程中会出现下面错误
错误信息
import lightgbm
File "/opt/venv3/lib/python3.7/site-packages/lightgbm/__init__.py", line 8, in <module>
from .basic import Booster, Dataset
File "/opt/venv3/lib/python3.7/site-packages/lightgbm/basic.py", line 32, in <module>
_LIB = _load_lib()
File "/opt/venv3/lib/python3.7/site-packages/lightgbm/basic.py", line 27, in _load_lib
lib = ctypes.cdll.LoadLibrary(lib_path[0])
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 442, in LoadLibrary
return self._dlltype(name)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/opt/venv3/lib/python3.7/site-packages/lightgbm/lib_lightgbm.so, 6): Library not loaded: /usr/local/opt/gcc/lib/gcc/7/libgomp.1.dylib
Referenced from: /opt/venv3/lib/python3.7/site-packages/lightgbm/lib_lightgbm.so
Reason: image not found
错误解决:
brew install cmake
brew install gcc
cd /opt #这个不是固定的,任意地方都行
git clone --recursive https://github.com/Microsoft/LightGBM
cd LightGBM
# 需要确认下自己电脑上的gcc版本
/usr/local/opt/gcc/lib/gcc/
total 0
drwxr-xr-x 47 mafei staff 1.5K 3 8 11:52 **9**
我电脑上gcc版本是9,所以命令是
export CXX=g++-9 CC=gcc-9
mkdir build ; cd build
cmake ..
make -j4
当前的LightGBM目录
pwd
/opt/LightGBM/build
进入python-package,会有一个setup.py文件
cd /opt/LightGBM/python-package
执行
python setup.py install --precompile
完美解决