LSF---【如何搭建SGE】

2 篇文章 0 订阅

一、安装前准备

1.确认所有机器的系统至少是redhat/centos 6.5以上,且完全安装。

2.确认是否所有需要加入compute farm的机器都配置了IP 地址和机器名, 并且所有机器都在同一个子网且互相可以通过机器名访问(ping 通)。

3.指定一台机器作为SGE master (以下简称 serverA), 在serverA上指定一个目录作为SGE_ROOT 目录,且该目录需要共享给其他机器访问, 假设SGE_ROOT 为/data/sge, 在serverA上共享该目录的步骤为:

  1. 编辑 /etc/exports 文件,加入一行:
/data/sge *(rw,sync,no_root_squash)
  1. reload export
[root@serverA~]# exportfs -r
  1. 重启NFS 服务
[root@serverA ~]# service nfs restart

4.其他每台机器 (简称serverX)上mount /data/sge ( 注意serverX机器上不可以有本地目录与/data 重名)

  1. 编辑/etc/fstab, 加入一行:
    serverA:/data/sge /data/sge nfs defaults 0 0
  2. 创建目录/data/sge
[root@serverX~]# mkdir -P /data/sge
  1. 手动mount /data/sge
[root@serverX~]# mount -a

5.确定将来使用SGE提交任务的用户账号是本地账号还是NIS,如果机器较多,建议使用NIS 来管理账号,这样避免在每台机器上重复创建账号,如果使用本地账号的话,需要在每台机器上创建账号 (参考useradd的用法),且账号的名字,UID以及home 目录必须相同。

6.准备SGE-8.1.9 的安装文件,参考下载地址: https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9下载所有gridengine-*8.1.9-1.el6.x86_64.rpm
在https://opsx.alibaba.com/mirror 下载dependency package:

hwloc-1.5-3.el6_5.x86_64.rpm
hwloc-devel-1.5-3.el6_5.x86_64.rpm
jemalloc-3.6.0-1.el6.x86_64.rpm
lesstif-0.95.2-1.el6.x86_64.rpm
munge-libs-0.5.10-1.el6.x86_64.rpm
openmotif22-2.2.3-19.el6.x86_64.rpm
perl-XML-Simple-2.18-6.el6.noarch.rpm

到ServerA: /data/sge/.

二、安装SGE

2.1安装SGE Master

1.Root 登录ServerA, 首先安装步骤6中下载的所有dependency package

[root@serverA ~]#  cd  /data/sge
[root@serverA ~]# rpm -ivh    hwloc-1.5-3.el6_5.x86_64.rpm  jemalloc-3.6.0-1.el6.x86_64.rpm munge-libs-0.5.10-1.el6.x86_64.rpm perl-XML-Simple-2.18-6.el6.noarch.rpm

2.安装SGE master

[root@serverA ~]# setenv SGE_ROOT /data/sge
[root@serverA ~]# cd $SGE_ROOT
[root@serverA:/data/sge]% rpm -ivh  gridengine-8.1.9-1.el6.x86_64.rpm
[root@serverA: /data/sge]% rpm -iv gridengine-qmaster-8.1.9-1.el6.x86_64.rpm

•press enter at the intro screen 
•press "y" and then specify root as the user id 
•leave the install dir as /data/sge 

•You will now be asked about port configuration for the master, normally you would choose the default (2) which uses the /etc/services file 

•accept the sge_qmaster info 

•You will now be asked about port configuration for the master, normally you would choose the default accept the sge_execd info 
•leave the cell name as "default" 
•Enter an appropriate cluster name when requested 
•leave the spool dir as is 
•press "n" for no windows hosts! 
•press "y" (permissions are set correctly) 
•press "y" for all hosts in one domain 
•If you have Java available on your Qmaster and wish to use SGE Inspect or SDM then enable the JMX MBean server and provide the requested information - probably answer "n" at this point! 
•press enter to accept the directory creation notification 

•enter "classic" for classic spooling 

•press enter to accept the next notice 
•enter "20000-20100" as the GID range (increase this range if you have execution nodes capable of running more than 100 concurrent jobs) 
•accept the default spool dir or specify a different folder (for example if you wish to use a shared or local folder outside of SGE_ROOT 

•enter an email address that will be sent problem reports
•press "n" to refuse to change the parameters you have just configured 
•press enter to accept the next notice 
•press "y" to install the startup scripts 
•press enter twice to confirm the following messages 
•press "n" for a file with a list of hosts 
•enter the names of your hosts who will be able to administer and submit jobs (enter alone to finish adding hosts) 
•skip shadow hosts for now (press "n") 
•choose "1" for normal configuration and agree with "y" 
•press enter to accept the next message and "n" to refuse to see the previous screen again and then finally enter to exit the installer

3.将sgemaster service 设置为开机自动启动

[root@serverA:/data/sge]% chkconfig sgemaster.$SGE_CLUSTER_NAME on

4.检查sgemaster 进程是否正在运行,如果没有手动启动:

[root@serverA:/data/sge]% /etc/init.d/sgemaster.$SGE_CLUSTER_NAME start

2.2 Install execution daemon

  1. Root login serverX
[root@serverX ~]# setenv SGE_ROOT /data/sge
[root@serverX ~]# cd $SGE_ROOT
[root@serverX:/data/sge]% rpm -iv gridengine-execd-8.1.9-1.el6.x86_64.rpm    (全部默认选项安装完成)
  1. 设置sgeexecd 自动启动
[root@serverX:/data/sge]% /etc/init.d/sgeexecd.$SGE_CLUSTER_NAME start

总结

更多关于SGE 的配置以及使用请参考: http://www.softpanorama.org/HPC/Grid_engine/Implementations/Son_of_grid_engine/installation_of_soge818_rpms_for_master_host.shtml

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值