一 LightGBM分布式支持说明
使用源码编译过的二进制lightgbm来运行分布式
分布式worker之间通信可以使用Socket与MPI方式(MPI通信更快,建议使用)
二 LightGBM分布式环境安装
分布式训练环境是Ubuntu
一 Socket环境支持
On Linux LightGBM can be built using CMake and gcc or Clang.
-
Install CMake.
-
Run the following commands:
git clone --recursive https://github.com/microsoft/LightGBM cd LightGBM mkdir build cd build cmake .. make -j4
Note: glibc >= 2.14 is required.
Also, you may want to read gcc Tips.
二 MPI环境支持
The default build version of LightGBM is based on socket. LightGBM also supports MPI. MPI is a high performance communication approach with RDMA support.
If you need to run a distributed learning application with high performance communication, you can build the LightGBM with MPI support.
On Linux an MPI version of LightGBM can be built using Open MPI, CMake and gcc or Clang.
1. OpenMPI安装方法
1. OpenSSH安装
apt-get update
apt-get install openssh-server
如果是多台机器使用MPI通信,请配置节点间ssh免密登录 并设置 StrictHostKeyChecking=no
# ~/.ssh/config 中添加如下信息
Host *
StrictHostKeyChecking no
2.安装openmpi (建议安装4.1.0版本)
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz
tar zxvf openmpi-4.1.0.tar.gz
cd openmpi-4.1.0/
./configure --prefix=/usr/local/openmpi
# 如果环境直接安装在/usr/local, 这样可以不用设置之后的环境变量
#./configure --prefix=/usr/local
make -j8
make install
3. 配置环境变量(~/.bashrc)
export PATH=$PATH:/usr/local/openmpi/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi/lib/
source ~/.bashrc
sudo ldconfig
3. 验证安装成功
cd examples
make
./hello_c
#mpirun -np 8 hello_c
Hello, world, I am 0 of 1, (Open MPI v3.1.0, package: Open MPI root@ssli_centos7 Distribution, ident: 3.1.0, repo rev: v3.1.0, May 07, 2018, 112)
示例程序正确运行,说明安装成功
2. Install CMake.
Run the following commands:
git clone --recursive https://github.com/microsoft/LightGBM
cd LightGBM
mkdir build
cd build
cmake -DUSE_MPI=ON ..
make -j4
Note: glibc >= 2.14 is required.
三 分布式环境测试
相关数据与参数配置请参考如下:
https://github.com/microsoft/LightGBM/blob/master/examples/parallel_learning/README.md
Node1:
Node2:
Copy data file, executable file, config file and mlist.txt
to all machines.
Note: MPI needs to be run in the same path on all machines.
Run following command on one machine (not need to run on all machines), need to change your_config_file
to real config file.
mpiexec --machinefile mlist.txt ./lightgbm config=your_config_file
在实验中(openMPI 4.1.0),发现用上面的方法,不会在每台机器上分配1个进程,可以使用如下方法启动分布式
mpiexec -np 2 -H 172.200.24.6,172.200.25.10 lightgbm config=train.conf