个人学习笔记,仅供参考,如若描述有误欢迎讨论指正!
1. LSF Hosts
Hosts in your cluster perform different functions.
Master host
LSF server host that acts as the overall coordinator for the cluster, doing all job scheduling and dispatch.
Server host
A host that submits and runs jobs.
Client host
A host that only submits jobs and tasks.
Execution host
A host that runs jobs and tasks.
Submission host
A host from which jobs and tasks are submitted
2. LSF Cluster
badmin controls the operation of mbatchd and sbatchd
lsadmin controls the operation of lim and res
3. LSF Jobs
4. Job States
LSF jobs have the following states:
• PEND — Waiting in a queue for scheduling and dispatch
• RUN — Dispatched to a host and running
• DONE — Finished normally with zero exit value
• EXIT — Finished with non-zero exit value
• PSUSP — Suspended while pending
• USUSP — Suspended by user
• SSUSP — Suspended by the LSF system
• POST_DONE — Post-processing completed without errors
• POST_ERR — Post-processing completed with errors
• WAIT — Members of a chunk job that are waiting to run
5. LSF Directories
The following directories are owned by the primary LSF administrator (lsfadmin) and are readable by all cluster users:
• LSF_CONFDIR
LSF configuration directory: /usr/share/lsf/conf
• LSB_CONFDIR
LSF Batch configuration directory: /usr/share/lsf/conf/lsbatch
• LSB_SHAREDIR
LSF Batch job history directory: /usr/share/lsf/work
• LSF_LOGDIR
Server daemon error logs, one for each LSF daemon: /usr/share/lsf/log
The following directories are owned by root and are readable by all cluster users:
• LSF_BINDIR
LSF user commands, shared by all hosts of the same type. For example:
/usr/share/lsf/9.1/sparc-sol10/bin/
• LSF_INCLUDEDIR
Header files lsf/lsf.h and lsf/lsbatch.h:
/usr/share/lsf/9.1/include
• LSF_LIBDIR
LSF libraries, shared by all hosts of the same type. For example:
/usr/share/lsf/9.1/sparc-sol10/lib/
• LSF_MANDIR
LSF man pages: /usr/share/lsf/9.1/man
• LSF_MISC
Examples and other miscellaneous files: /usr/share/lsf/9.1/misc
• LSF_SERVERDIR
Server daemon binaries, scripts and other utilities, shared by all hosts of the same type. For example:
/usr/share/lsf/9.1/sparc-sol10/etc/
• LSF_TOP
Top-level installation directory: /usr/share/lsf
Other configuration directories are specified in /usr/share/lsf/conf/lsf.conf.
6. LSF Cluster Configuration Files
The following files are owned by the primary LSF administrator (lsfadmin) and are readable by all cluster users:
• LSF global configuration files describing the configuration and operation of the cluster saozi:
/usr/share/lsf/conf/ego/saozi/kernel/ego.conf
/usr/share/lsf/conf/lsf.conf
• LSF keyword definition file shared by all clusters. Defines cluster name, host types, host models, and site-specific resources:
/usr/share/lsf/conf/lsf.shared
• LSF cluster configuration file that defines hosts, administrators, and location of site-defined shared resources:
/usr/share/lsf/conf/lsf.cluster.saozi
• LSF mapping files for task names and their default resource requirements:
/usr/share/lsf/conf/lsf.task
/usr/share/lsf/conf/lsf.task.saozi
7. LSF Batch Configuration Files\
The following files are owned by the primary LSF administrator (lsfadmin) and are readable by all cluster users:
• LSF server hosts and their attributes, such as scheduling load thresholds, dispatch windows, and job slot limits:
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.hosts
If no hosts are defined in this file, then all LSF server hosts listed in /usr/share/lsf/conf/lsf.cluster.saozi are assumed to be LSF Batch server hosts.
• LSF scheduler and resource broker plugin modules. If no scheduler or resource broker modules are configured, LSF uses the default scheduler plugin module named schmod_default:
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.modules
• LSF Batch system parameter file:
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.params
• LSF job queue definitions:
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.queues
• Resource allocation limits, exports, and resource usage limits:
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.resources
• LSF user groups, hierarchical fairshare for users and user groups, and job slot limits for users and user groups:
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.users
Also used to configure account mappings in a MultiCluster environment.
• Application profiles, common parameters for the same type of jobs, including the execution requirements of the applications, the resources they require, and how they should be run and managed :
/usr/share/lsf/conf/lsbatch/saozi/configdir/lsb.applications
This file is optional. Use the DEFAULT_APPLICATION parameter in lsb.params to specify a default application profile for all jobs. LSF does not automatically assign a default application profile.
8. LSF Batch Log Files
• LSF Batch events log:
/usr/share/lsf/work/saozi/logdir/lsb.events
• LSF Batch accounting log:
/usr/share/lsf/work/saozi/logdir/lsb.acct
9. LSF Daemon Log Files
LSF server daemon log files are stored in /usr/share/lsf/log:
• Load Information Manager (lim)
/usr/share/lsf/log/lim.log.host_name
• Remote Execution Server (res)
/usr/share/lsf/log/res.log.host_name
• Master Batch Daemon (mbatchd)
/usr/share/lsf/log/mbatchd.log.mgmt1
• Master Scheduler Daemon (mbschd)
/usr/share/lsf/log/mbschd.log.mgmt1
• Slave Batch Daemon (sbatchd)
/usr/share/lsf/log/sbatchd.log.host_name
• Process Information Manager (pim)
/usr/share/lsf/log/pim.log.host_name