搭建目的
为解决lda单机运行瓶颈,而plda的并行架构基于MPI。
MPI install
目标机器:10.210.228.63 10.210.228.64 10.210.228.65
安装步骤:
- Download mpich2-1.0.8.tar.gz from http://www.mpich.org/static/downloads/1.0.8/
- mkdir -p /data2/local/mpich2-1.0.8
- tar -zxvf mpich2-1.0.8.tar.gz
- cd mpich2-1.0.8
- ./configure --prefix=/data2/local/mpich2-1.0.8
- make && make install
- add $PATH /data2/local/mpich2-1.0.8/bin
- touch /etc/mpd.conf
- echo “secretword=sina_algo_mpi” > /etc/mpd.conf
- chmod 600 /etc/mpd.conf
在三台目标机器分别按如上步骤安装。
选定10.210.228.63为master,另外两台为slaves。
启动master,在master机器10.210.228.63上执行
mpd --daemon --listenport=55555
启动slaves,在slaves机器10.210.228.64 、10.210.228.65上执行
mpd --daemon -h 10.210.228.63 -p 55555
测试MPI是否正常,在master机器执行mpdtrace,可以看到三台机器说明MPI安装成功。
注意:plda wiki里要求mpi版本必须是 mpich2-1.0.8,当前mpi最新版本是mpich-3.0.4
运行plda测试例子
在plda目录下(/data2/local/plda)执行
mpiexec -n 5 ./mpi_lda --num_topics 2 --alpha 0.1 --beta 0.01 --training_data_file testdata/test_data.txt --model_file /tmp/lda_model.txt --total_iterations 150
注意:这里可以在目标机器上的任意一台机器运行,mpi各个机器之间是ring结构
MPI 优化
目前安装的都是defalut安装
Performance Options:
--enable-fast - Turns off error checking and collection of internal
timing information
--enable-timing=no - Turns off just the collection of internal timing
information
--enable-ndebug - Turns on NDEBUG, which disables asserts. This is a
subset of the optimizations provided by enable-fast,
but is useful in environments where the user wishes
to retain the debug symbols, e.g., this can be combined
with the --enable-g option.
Reference: