linux集群pbs管理,PBS-Torque集群部署

PBS是一个批处理作业和计算机系统资源管理的软件包,主要功能是计算机集群中资源管理、作业调度。 PBS包含openPBS,PBS Pro和Torque三个主要分支;

一、基础环境

1、主机名和IP

控制节点:192.168.1.11 m1

计算节点:192.168.1.12 c1

计算节点:192.168.1.13 c2

2、主机配置

系统: Centos7.6 x86_64

CPU: 4C

内存:4G

3、关闭防火墙

# systemctl stop firewalld

# systemctl disable firewalld

# systemctl stop iptables

# systemctl disable iptables

4、修改资源限制

# vim /etc/security/limits.conf* hard nofile 1000000

* soft nofile 1000000

*soft core unlimited* soft stack 10240

*soft memlock unlimited* hard memlock unlimited

5、配置CST时区

# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

同步NTP服务器

# ntpdate 210.72.145.44# yum install ntp-y

# systemctl start ntpd

# systemctl enable ntpd

安装EPEL源

# yum install http://mirrors.sohu.com/fedora-epel/epel-release-latest-7.noarch.rpm

6、安装NFS

# yum install -y nfs-utils rpcbind

编辑/etc/exports文件

# mkdir /software

# cat/etc/exports/software *(rw,async,insecure,no_root_squash)

启动NFS

# systemctl start nfs

# systemctl start rpcbind

# systemctl enable nfs

# systemctl enable rpcbind

计算节点挂载NFS

# yum install -y nfs-utils

# mkdir/software

# mount m1:/software /software

7、管理节点配置SSH免登陆

# ssh-keygen

# ssh-copy-id -i .ssh/id_rsa.pub c1

# ssh-copy-id -i .ssh/id_rsa.pub c2

二、部署Torque管理节点

Torque由四个服务组成:

pbs_server :资源管理系统的服务器,根据调度进程提供的可用节点资源清单进行作业分发和回收;

pbs_mom   :客户端,监视各计算节点的资源使用情况;

trqauthd      :用于授权pbs_mom进程与pbs_server进程之间建立互信连接;

pbs_sched  :任务调度器;

1、安装依赖

# yum install -y libtool openssl-devel libxml2-devel boost-devel gcc gcc-c++ hwloc hwloc-devel

2、安装Torque

# wget http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.3.tar.gz

# tar zxvf torque-6.1.3.tar.gz

# cd torque-6.1.3# ./configure --prefix=/usr/local/torque --with-scp

# make-j4

# make install

生成计算节点需要的安装包,会生成5个可执行脚本

# make packages

# libtool--finish /usr/local/torque/lib

3、配置Torque服务端

添加环境变量

# . /etc/profile.d/torque.sh

初始化serverdb

# qterm

# ./torque.setup root

4、开启Torque服务端

# qterm

# systemctl enable pbs_server

# systemctl start pbs_server

# systemctl enable trqauthd

# systemctl start trqauthd

三、部署Torque计算节点

1、安装客户端

将torque文件夹的安装包复制到计算节点,或复制到NFS目录

# ./torque-package-mom-linux-x86_64.sh --install

# ./torque-package-clients-linux-x86_64.sh --install

2、配置客户端

# vim /var/spool/torque/mom_priv/config

$pbsserver m1

$logevent225$loglevel4$usecp m1:/data /data

3、启动客户端

# systemctl enable pbs_mom

# systemctl start pbs_mom

# systemctl enable trqauthd

# systemctl start trqauthd

确保servern_name文件内容为管理节点名

# cat /var/spool/torque/server_name

查看各节点状态

# qnodes

c1

state=free

power_state=Running

np= 4ntype=cluster

status= opsys=linux,uname=Linux c1 4.19.0-6.el7.ucloud.x86_64 #1 SMP Wed Feb 12 07:32:16 UTC 2020 x86_64,sessions=44684,nsessions=1,nusers=1,idletime=142984,totmem=3873444kb,availmem=3429928kb,physmem=3873444kb,ncpus=4,loadave=0.00,gres=,netload=358955336,state=free,varattr= ,cpuclock=Fixed,macaddr=52:54:00:ba:9e:8b,version=6.1.3,rectime=1597632442,jobs=mom_service_port= 15002mom_manager_port= 15003c2

state=free

power_state=Running

np= 4ntype=cluster

status= opsys=linux,uname=Linux c2 4.19.0-6.el7.ucloud.x86_64 #1 SMP Wed Feb 12 07:32:16 UTC 2020 x86_64,nsessions=0,nusers=0,idletime=2262,totmem=3873444kb,availmem=3553212kb,physmem=3873444kb,ncpus=4,loadave=0.01,gres=,netload=61440494,state=free,varattr= ,cpuclock=Fixed,macaddr=52:54:00:e9:4a:a6,version=6.1.3,rectime=1597632467,jobs=mom_service_port= 15002mom_manager_port= 15003

四、配置调度器

1、启动调度器

# cp contrib/systemd/pbs_sched.service /usr/lib/systemd/system/# systemctl enable pbs_sched

# systemctl start pbs_sched

2、配置队列

# qmgr -c 'create queue batch'# qmgr-c 'set server default_queue=batch'# qmgr-c 'set server query_other_jobs=true'# qmgr-c 'set queue batch queue_type=execution'# qmgr-c 'set queue batch started=true'# qmgr-c 'set queue batch enabled=true'# qmgr-c 'set queue batch resources_default.nodes=1'# qmgr-c 'set server scheduling=true'

3、测试

# echo "sleep 30" | qsub

查看任务信息

# qstat -a

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值