Dependence
NIS/LDAP, NFS
Install
Add LSF Administrator
# useradd -u 50001 lsfadmin
Configure install.config
# vim /tools/tmp/lsf/install.config
LSF_TOP="/tools/env/lsf" (Install Directory)
LSF_ADMINS="lsfadmin" (LSF Administrator)
LSF_CLUSTER_NAME="Platform" (Cluster Name)
LSF_MASTER_LIST="lsf01 lsf02" (lsf01: Master; lsf02: Candidate Master, if Master down, Candidate will take over the cluster after about 2-5 minutes)
LSF_ENTITLEMENT_FILE="/tools/tmp/lsf/xxx.dat" (License File)
LSF_TARDIR="/tools/tmp/lsf/" (Installation package)
Install
# ./lsfinstall -f install.config
Automatic startup
# source /tools/env/lsf/conf/profile.sh
# lsadmin limstartup; lsadmin resstartup; badmin hstartup
# /tools/env/lsf/10.1/install/hostsetup --top="/tools/env/lsf" --boot="y"
Manage Cluster
Define client host and hostgroup
Define client host in lsf.cluster.platform & lsb.hosts
# vim /tools/env/lsf/conf/lsf.cluster.platform
...
Begin Host
HOSTNAME model type server r1m mem swp RESOURCES #Keywords
#apple Sparc5S SUNSOL 1 3.5 1 2 (sparc bsd) #Example
#peach DEC3100 DigitalUNIX 1 3.5 1 2 (alpha osf1)
#banana HP9K778 HPPA 1 3.5 1 2 (hp68k hpux)
#mango HP735 HPPA 1 3.5 1 2 (hpux cs)
#grape SGI4D35 SGI5 1 3.5 1 2 (irix)
#lemon PC200 LINUX 1 3.5 1 2 (linux)
#pear IBM350 IBMAIX4 1 3.5 1 2 (aix cs)
#plum PENT_100 NTX86 1 3.5 1 2 (nt)
#berry DEC3100 ! 1 3.5 1 2 (ultrix fs bsd mips dec)
#orange ! SUNSOL 1 3.5 1 2 (sparc bsd) #Example
#prune ! ! 1 3.5 1 2 (convex)
lsf01 ! ! 1 3.5 () () (mg)
lsf02 ! ! 1 3.5 () () (mg)
server1 ! ! 1 3.5 () () (mg)
...
# vim /tools/env/lsf/conf/lsbatch/platform/configdir/lsb.hosts
...
Begin Host
HOST_NAME MXJ r1m pg ls tmp DISPATCH_WINDOW AFFINITY # Keywords
#hostA () 3.5/4.5 15/ 12/15 0 () (Y) # Example
#hostB ! 3.5 15/18 12/ 0/ (5:19:00-1:8:30 20:00-8:30) (Y)
#hostC 1 3.5/5 18 15 () () (Y) # Example
#hostD ! () () () () () (Y) # Example
#hostE 4 () () () () () (Y) # Example
#SPARCIPC () 4.0/5.0 18 16 () () (Y) # Example
default ! () () () () () (Y) # Example
lsf01 0 () () () () () (Y) # Example
lsf02 0 () () () () () (Y) # Example
server1 96 () () () () () (Y) # Example
...
Notice: "MXJ" usually equal CPU processor, is means how much slot the server can accept.
Define hostgroup in lsb.hosts
# vim /tools/env/lsf/conf/lsbatch/platform/configdir/lsb.hosts
...
# This example is commented out
Begin HostGroup
GROUP_NAME GROUP_MEMBER # Key words
#hgroup1 (hostA hostD ) # Define a host group
test_group (server1)
...
Define usergroup
Define usergroup in lsb.users
# vim /tools/env/lsf/conf/lsbatch/platform/configdir/lsb.users
...
Begin UserGroup
GROUP_NAME GROUP_MEMBER USER_SHARES #GROUP_ADMIN
test_user (user01 user02) ()
#ugroup1 (user1 user2 user3 user4) ([user1, 4] [others, 10]) #(user1 user2[full])
...
Define queue
Define queue in lsb.queues
Attention: Change INTERACTIVE = NO to YES, if you want to submit interactive job, such as xterm...
# vim /tools/env/lsf/conf/lsbatch/platform/configdir/lsb.queues
...
Begin Queue
QUEUE_NAME = test_queue
PRIORITY = 30
INTERACTIVE = YES
FAIRSHARE = USER_SHARES[[default,1]]
#RUN_WINDOW = 5:19:00-1:8:30 20:00-8:30
#r1m = 0.7/2.0 # loadSched/loadStop
#r15m = 1.0/2.5
#pg = 4.0/8
#ut = 0.2
#io = 50/240
#CPULIMIT = 180/hostA # 3 hours of host hostA
#FILELIMIT = 20000
#DATALIMIT = 20000 # jobs data segment limit
#CORELIMIT = 20000
#TASKLIMIT = 5 # job task limit
USERS = test_user # users who can submit jobs to this queue
HOSTS = test_group # hosts on which jobs in this queue can run
#PRE_EXEC = /usr/local/lsf/misc/testq_pre >> /tmp/pre.out
#POST_EXEC = /usr/local/lsf/misc/testq_post |grep -v "Hey"
#REQUEUE_EXIT_VALUES = 55 34 78
#APS_PRIORITY = WEIGHT[[RSRC, 10.0] [MEM, 20.0] [PROC, 2.5] [QPRIORITY, 2.0]] \
# LIMIT[[RSRC, 3.5] [QPRIORITY, 5.5]] \
# GRACE_PERIOD[[QPRIORITY, 200s] [MEM, 10m] [PROC, 2h]]
DESCRIPTION = For normal low priority jobs, running only if hosts are \
lightly loaded.
End Queue
...
Change default queue
The default queue ist "normal" & "interactive", you can redefine in lsb.params
# vim /tools/env/lsf/conf/lsbatch/platform/configdir/lsb.params
...
Begin Parameters
DEFAULT_QUEUE = test_queues # Default job queue names
...
Reconfigure
# lsadmin reconfig
If there was something wrong with your config file "lsf.cluster.platform", it would tell you there was an error and ask if you need to restart.
# badmin reconfig
After lsadmin reconfig pass, execute "badmin reconfig". Also, if there was something wrong with your config file "lsb.xxx", it would tell you there was an error and ask if you need to restart.
# badmin mbdrestart
check config file "lsb.xxx" syntax
Client join the cluster
# source /tools/env/lsf/conf/profile.sh
# lsadmin limstartup; lsadmin resstartup; badmin hstartup
# /tools/env/lsf/10.1/install/hostsetup --top="/tools/env/lsf" --boot="y"
# bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
lsf01 closed - 0 0 0 0 0 0
lsf02 closed - 0 0 0 0 0 0
server1 ok - 96 0 0 0 0 0
...
End