torque简介
官网 http://www.adaptivecomputing.com/products/open-source/torque/
官网简介:
TORQUE Resource Manager provides control over batch jobs and distributed computing resources. It is an advanced open-source product based on the original PBS project* and incorporates the best of both community and professional development. It incorporates significant advances in the areas of scalability, reliability, and functionality and is currently in use at tens of thousands of leading government, academic, and commercial sites throughout the world. TORQUE may be freely used, modified, and distributed under the constraints of the included license.
TORQUE can integrate with Moab Workload Manager to improve overall utilization, scheduling and administration on a cluster. Customers who purchase Moab family products also receive free support for TORQUE.
下载地址:
http://www.adaptivecomputing.com/support/download-center/torque-download/
torque 安装
安装前准备
1 时间同步
ntpdate cn.pool.ntp.org
如果没有ntpdate命令,则yum install ntpdate -y
2主机名
检查三个地方
变量 :hostname
文件:cat /etc/sysconfig/network
hosts对应关系: vim /etc/hosts
IP1 hostname1
IP2 hostname2
IP3 hostname3
3 依赖插件
yum install openssl-devel -y
yum install libxml2-devel -y
yum install boost-devel -y
4 建立用户
groupadd torque -g 2005
useradd -u 2004 -g 2005 torque
passwd torque
5 torque用户自动化登陆
ssh-keygen
ssh-copy-id -i /home/torque/.ssh/id_rsa.pub IP2
正式安装
主节点:
下载:
组件说明
pbs_server:主节点
pbs_mom:计算节点
PBS客户端:提交节点
名词说明
1 pbs_server
The Torque server (pbs_server) contains all the information about a cluster. It
knows about all of the MOM nodes in the cluster based on the information in the
TORQUE_HOME/server_priv/nodes file (See Configuring Torque on Compute
Nodes). It also maintains the status of each MOM node through updates from
the MOMs in the cluster (see pbsnodes). All jobs are submitted via qsub to the
server, which maintains a master database of all jobs and their states.
Schedulers such as Moab Workload Manager receive job, queue, and node
information from pbs_server and submit all jobs to be run to pbs_server.
The server configuration is maintained in a file named serverdb, located in
TORQUE_HOME/server_priv. The serverdb file contains all parameters
pertaining to the operation of Torque plus all of the queues which are in the
configuration. For pbs_server to run, serverdb must be initialized
2
开始安装
torque用户:
tar -zxf torque-6.0.0.1-1449528029_21cc3d8.tar.gz
mv torque-6.0.0.1-1449528029_21cc3d8 torque
cd torque
./configure
make
切换到root:
make install
生成的文件在/var/spool/torque下
回到torque用户,接下来要生成一些客户端的所需的文件
make packages
生成一些文件
-rwxr-xr-x. 1 torque torque 158411 Mar 16 15:30 torque-package-doc-linux-x86_64.sh
drwxrwxr-x. 7 torque torque 4096 Mar 16 15:30 tpackages
-rwxr-xr-x. 1 torque torque 4699194 Mar 16 15:30 torque-package-devel-linux-x86_64.sh
-rwxr-xr-x. 1 torque torque 2241049 Mar 16 15:30 torque-package-clients-linux-x86_64.sh
-rwxr-xr-x. 1 torque torque 3226067 Mar 16 15:30 torque-package-mom-linux-x86_64.sh
-rwxr-xr-x. 1 torque torque 5023807 Mar 16 15:29 torque-package-server-linux-x86_64.sh
拷贝到其他节点
scp torque-package-mom-linux-x86_64.sh torque-package-clients-linux-x86_64.sh IP2:/home/torque/
scp torque-package-mom-linux-x86_64.sh torque-package-clients-linux-x86_64.sh IP3:/home/torque/
==============================================
客户端平台相同的,直接执行那两个脚本
切换到root用户
torque-package-mom-linux-x86_64.sh –install
torque-package-clients-linux-x86_64.sh –install
如果客户端不同,则不能直接使用那两个脚本,
而需要在客户端执行
解压,
./configure
make
切换到root
make install_mom install_clients
(此处没有测试过,参考http://blog.chinaunix.net/uid-261392-id-2138892.html )
================================================
接下来,继续在主节点:
确认主节点名字
/var/spool/torque/server_name
设置配置文件
vim /var/spool/torque/mom_priv/config
添加
pbsserverIP1
logevent 255
用root运行
/home/torque/torque/torque.setup root
添加计算节点
cd /var/spool/torque/server_priv
vim nodes
IP1 np=12
IP2 np=12
IP3 np=12
注:np=12表示有12核cpu可以提供运行
设置配置文件
cd /var/spool/torque/mom_priv
vim config
pbsserverIP1
logevent 255
在主节点(主节点也当做计算节点)
pbs_mom -c /var/spool/torque/mom_priv/config
或者
/usr/local/sbin/pbs_mom -c /var/spool/torque/mom_priv/config
qterm -t quick
pbs_server
pbs_sched 启动调度器进程
计算节点:
启动计算节点(在所有服务器执行,主节点刚才已经执行)
/usr/local/sbin/pbs_mom -c /var/spool/torque/mom_priv/config
torque测试
不能用root运行torque计算
echo “echo a ;sleep 100”|qsub
echo “echo a ;sleep 100”|qsub
echo “echo a ;sleep 100”|qsub
echo “echo a ;sleep 100”|qsub
qstat
查看各个节点状态
pbsnodes -a
将计算节点加入到其他集群
修改conf文件 server_name文件
shell# cat /var/spool/torque/server_name
IP1
shell#
shell# cat /var/spool/torque/mom_priv/config
pbsserverIP1
logevent 255
计算节点重启mom进程
主节点重启 pbs_sched pbs_server
ps -ef|grep mom
ps -ef|grep pbs_sched
ps -ef|grep pbs_server
找到pid
kill
然后启动进程
/usr/local/sbin/pbs_mom -c /var/spool/torque/mom_priv/config
pbs_sched
pbs_server
检查状态
用pbsnodes检查,发现有一个down,查看一下mom进程,发现是两个,多了一个子进程
杀掉,重启即可
shell# ps -ef|grep mom
root 44636 1 0 17:17 ? 00:00:01 pbs_mom -c /var/spool/torque/mom_priv/config
root 45925 44636 0 18:22 ? 00:00:00 pbs_mom -c /var/spool/torque/mom_priv/config
root 45975 45638 0 18:22 pts/4 00:00:00 grep mom
shell# kill -9 44636
shell# kill -9 45899
在随便一台客户端使用torque服务
1从服务器生成的脚本中scp一个脚本过来
torque-package-clients-linux-x86_64.sh
安装
2然后启动/usr/local/sbin/trqauthd
然后就可以了
torque简单使用
可能报错和解决方法
安装 报错
1 ERROR: pbs_server already running…. run ‘qterm’ to stop pbs_server and rerun
解决
qterm
./torque.setup root
2 pbsnodes -a命令报错
pbsnodes: Server has no node list MSG=node list is empty -check ‘server_priv/nodes’ file
一种是nodes文件不存在
另一种是重新执行pbs_server的时候, node没了
重新
qterm
pbs_server
3
本博文参考
1 http://blog.csdn.net/educast/article/details/7167135
2 http://blog.chinaunix.net/uid-261392-id-2138892.html