在GPU上运行LGBM 并做速度对比
2019/11/4
参考文章:
最简便的lightGBM GPU支持的安装、验证方法 - lccever的博客 - CSDN博客
https://blog.csdn.net/lccever/article/details/80535058
目录: /home/hexi/boost
实践:
- 安装依赖
sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
- 安装PY库
pip install setuptools wheel numpy scipy scikit-learn -U
3.安装lightGBM-GPU
sudo pip3.6 install lightgbm --install-option=--gpu --install-option="--opencl-include-dir=/usr/local/cuda/include/" --install-option="--opencl-library=/usr/local/cuda/lib64/libOpenCL.so"
- 测试脚本
git clone https://github.com/guolinke/boosting_tree_benchmarks.git
cd boosting_tree_benchmarks/data
wget "https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz"
gunzip HIGGS.csv.gz
python higgs2libsvm.py
等待下载数据,压缩包有2.6G大小,需要比较长的时间。
下载后解压的数据文件有8G!
运行higgs2libsvm.py
,把数据转换成libsvm格式;
编写测试脚本,并运行:
cd /home/hexi/boost/boosting_tree_benchmarks/data
python lgbm-gpu.py
报错:
lightgbm.basic.LightGBMError: b'GPU Tree Learner was not enabled in this build. Recompile with CMake option -DUSE_GPU=1'
参考另一篇文章,重新编译:
版权声明:本文为CSDN博主「·清尘·」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/u012969412/article/details/71433960
$ git clone --recursive https://github.com/Microsoft/LightGBM
$ cd LightGBM
$ mkdir build
$ cd build
$ cmake -DUSE_GPU=1 ..
$ make -j
cd ../LightGBM/python-package
sudo python setup.py install
编译完后再运行测试程序:
cd /home/hexi/boost/boosting_tree_benchmarks/data
python lgbm-gpu.py
运行结果如下:
hexi@ubuntu:~/boost/boosting_tree_benchmarks/data$ python lgbm-gpu.py
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 1524
[LightGBM] [Info] Number of data: 10500000, number of used features: 28
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080 Ti, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 64 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 12
[LightGBM] [Info] 28 dense feature groups (280.38 MB) transferred to GPU in 0.183250 secs. 0 sparse feature groups
[LightGBM] [Info] Start training from score 0.529963
gpu version elapse time: 15.894050121307373
[LightGBM] [Info] Total Bins 1524
[LightGBM] [Info] Number of data: 10500000, number of used features: 28
[LightGBM] [Info] Start training from score 0.529963
cpu version elapse time: 10.747112035751343
测试发现模型创建有个时间差,也就是后面CPU的模型不需要创建时间了,所以时间变短了。
为了公平起见,把两个模型的运行分开成两个PY程序,然后用一个程序调用,
运行结果:
hexi@ubuntu:~/boost/boosting_tree_benchmarks/data$ python lgbmtest.py
Dataset loading...
GPU train beginning...
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 1524
[LightGBM] [Info] Number of data: 10500000, number of used features: 28
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: GeForce GTX 1080 Ti, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 64 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 12
[LightGBM] [Info] 28 dense feature groups (280.38 MB) transferred to GPU in 0.197229 secs. 0 sparse feature groups
[LightGBM] [Info] Start training from score 0.529963
gpu version elapse time: 12.653455257415771
Dataset loading...
CPU train beginning...
[LightGBM] [Info] Total Bins 1524
[LightGBM] [Info] Number of data: 10500000, number of used features: 28
[LightGBM] [Info] Start training from score 0.529963
cpu version elapse time: 19.767507553100586
hexi@ubuntu:~/boost/boosting_tree_benchmarks/data$
这时候就发现区别了。
运行截图如下:
** 补充 **
很多人测试的时候发现GPU的速度更慢,其实是因为把GPU和CPU的代码放到同一个文件里执行了,这样会造成对模型重新创建速度不一致。
正确的方式是把GPU测试和CPU测试的程序,分成两个文件;
然后使用一个脚本程序来启动它们:
lgbm-cpu.py 代码如下:
#!/usr/bin/env python3
#coding:utf-8
__author__ = 'xmxoxo<xmxoxo@qq.com>'
import lightgbm as lgb
import time
params = {'max_bin': 63,
'num_leaves': 255,
'learning_rate': 0.1,
'tree_learner': 'serial',
'task': 'train',
'is_training_metric': 'false',
'min_data_in_leaf': 1,
'min_sum_hessian_in_leaf': 100,
'ndcg_eval_at': [1,3,5,10],
'sparse_threshold': 1.0,
'device': 'cpu'
}
print('Dataset loading...')
dtrain = lgb.Dataset('./higgs.train')
print('CPU train beginning...')
t0 = time.time()
gbm = lgb.train(params, train_set=dtrain, num_boost_round=10,
valid_sets=None, valid_names=None,
fobj=None, feval=None, init_model=None,
feature_name='auto', categorical_feature='auto',
early_stopping_rounds=None, evals_result=None,
verbose_eval=True,
keep_training_booster=False, callbacks=None)
t1 = time.time()
print('cpu version elapse time: {}'.format(t1-t0))
if __name__ == '__main__':
pass
lgbm-gpu.py 代码如下:
#!/usr/bin/env python3
#coding:utf-8
__author__ = 'xmxoxo<xmxoxo@qq.com>'
import lightgbm as lgb
import time
print('Dataset loading...')
dtrain = lgb.Dataset('./higgs.train')
def lgbm_gpu ():
params = {'max_bin': 63,
'num_leaves': 255,
'learning_rate': 0.1,
'tree_learner': 'serial',
'task': 'train',
'is_training_metric': 'false',
'min_data_in_leaf': 1,
'min_sum_hessian_in_leaf': 100,
'ndcg_eval_at': [1,3,5,10],
'sparse_threshold': 1.0,
'device': 'gpu',
'gpu_platform_id': 0,
'gpu_device_id': 0}
print('GPU train beginning...')
t0 = time.time()
#print(t0)
gbm = lgb.train(params, train_set=dtrain, num_boost_round=10,
valid_sets=None, valid_names=None,
fobj=None, feval=None, init_model=None,
feature_name='auto', categorical_feature='auto',
early_stopping_rounds=None, evals_result=None,
verbose_eval=True,
keep_training_booster=False, callbacks=None)
t1 = time.time()
#print(t1)
print('gpu version elapse time: {}'.format(t1-t0))
lgbm_gpu()
def lgbm_cpu ():
params = {'max_bin': 63,
'num_leaves': 255,
'learning_rate': 0.1,
'tree_learner': 'serial',
'task': 'train',
'is_training_metric': 'false',
'min_data_in_leaf': 1,
'min_sum_hessian_in_leaf': 100,
'ndcg_eval_at': [1,3,5,10],
'sparse_threshold': 1.0,
'device': 'cpu',
}
print('开始测试CPU...')
t0 = time.time()
gbm = lgb.train(params, train_set=dtrain, num_boost_round=10,
valid_sets=None, valid_names=None,
fobj=None, feval=None, init_model=None,
feature_name='auto', categorical_feature='auto',
early_stopping_rounds=None, evals_result=None,
verbose_eval=True,
keep_training_booster=False, callbacks=None)
t1 = time.time()
print('cpu version elapse time: {}'.format(t1-t0))
if __name__ == '__main__':
pass
然后用一个脚本来启动上述的测试程序,当然你也可以手工执行。
lgbmtest.py 代码如下:
#!/usr/bin/env python3
#coding:utf-8
__author__ = 'xmxoxo<xmxoxo@qq.com>'
import os
os.system("python lgbm-gpu.py")
os.system("python lgbm-cpu.py")
if __name__ == '__main__':
pass