一、环境说明
主机名 | IP | 内存 | IOS | 硬盘 | CPU核数 |
hw-master | 10.2.152.230 | 512G | Centos-7.5 | 2*6T | 64 |
hw-node01 | 10.2.152.231 | 512G | Centos-7.5 | 2*6T | 64 |
hw-node02 | 10.2.152.232 | 512G | Centos-7.5 | 2*6T | 64 |
hw-node03 | 10.2.152.233 | 512G | Centos-7.5 | 2*6T | 64 |
hw-node04 | 10.2.152.234 | 512G | Centos-7.5 | 2*6T | 64 |
二、安装MPICH
(1)下载源码 :http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz
(2)编译安装
$:tar -xzvf mpich-3.1.4.tar.gz -C /home/share/mpich
$:cd /home/share/mpich/mpich-3.1.4
$:./configure --prefix=/usr/local/mpich
$:make && make install
(3)安装后加入环境变量到/etc/profile文件,并执行 source /etc/profile
PATH=$PATH:/usr/local/mpich/bin
MANPATH=$MANPATH:/usr/local/mpich/man
export PATH MANPATH
(4)单节点测试
- 复制源代码包下的examples目录到安装目录下
$:cd /home/share/mpich/mpich-3.1.4
$:cp -r examples/ /usr/local/mpich
- 执行测试
$:cd /usr/local/mpich
$:mpirun -np 10 ./examples/cpi
输出:
Process 1 of 10 is on hw-master
Process 2 of 10 is on hw-master
Process 3 of 10 is on hw-master
Process 4 of 10 is on hw-master
Process 5 of 10 is on hw-master
Process 6 of 10 is on hw-master
Process 7 of 10 is on hw-master
Process 8 of 10 is on hw-master
Process 9 of 10 is on hw-master
Process 0 of 10 is on hw-master
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.000058
三、配置集群
(1)配置ssh免密码登录
$:ssh-keygen -t rsa
$:ssh-copy-id root@10.2.152.230
$:ssh-copy-id root@10.2.152.231
$:ssh-copy-id root@10.2.152.232
$:ssh-copy-id root@10.2.152.233
$:ssh-copy-id root@10.2.152.234
(2)关闭防火墙和SELinux
关闭防火墙
启动: systemctl start firewalld
关闭: systemctl stop firewalld
查看状态: systemctl status firewalld
开机禁用: systemctl disable firewalld
开机启用: systemctl enable firewalld
关闭Selinux
查看
$: getenforce
Disabled表示关闭;Enforcing表示开启
$:/usr/sbin/sestatus -v
SELinux status: disabled
临时关闭
setenforce 1 //设置SELinux 成为enforcing模式 (开)
setenforce 0 //设置SELinux 成为permissive模式(关)
永久关闭
$:vi /etc/selinux/config
将SELINUX=enforcing改为SELINUX=disabled
设置后需要重启才能生效
(3)修改/etc/hosts文件
$:cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.152.230 hw-master
10.2.152.231 hw-node01
10.2.152.232 hw-node02
10.2.152.233 hw-node03
10.2.152.234 hw-node04
(4)在主节点的目录下增加servers文件,记录集群的机器名和对应的进程数
$:cd /home/share/mpich
$:cat servers
hw-master:2
hw-node01:2
hw-node02:2
hw-node03:2
hw-node04:2
(5)把5个节点的/usr/local/mpich/example/cpi这个计算圆周率的可执行文件复制到/home/share/mpich目录下
(6)在主节点的/home/share/mpich目录下执行
$:mpiexec -n 10 -f servers ./cpi
输出:
Process 6 of 10 is on hw-node03
Process 0 of 10 is on hw-master
Process 7 of 10 is on hw-node03
Process 1 of 10 is on hw-master
Process 8 of 10 is on hw-node04
Process 9 of 10 is on hw-node04
Process 2 of 10 is on hw-node01
Process 3 of 10 is on hw-node01
Process 4 of 10 is on hw-node02
Process 5 of 10 is on hw-node02
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.002362
到此说明集群已经搭建成功!!!!