如何使用TACC launcher来批量提交串行任务

TACC launcher 是什么?

它是一个简单实用的工具,用来帮助用户在一个批处理脚本中提交多个单线程或多线程的任务。

它的详细介绍请参考官网:传送门

它的下载地址:传送门

TACC launcher 怎么用?

非常推荐前往官网查看它的使用方法,有很详细的介绍。我就不再重复了,英文不好的朋友可以使用网页翻译工具翻译一下。

简单讲,就是:

  1. 将这个工具下载下来
  2. 解压缩
  3. 不需要编译!
  4. 配置环境变量
  5. 写一个joblist文件,里面写上所有需要执行的任务
  6. 使用launcher的命令提交

TACC launcher + slurm 实例

准备算例

我们准备一个joblist文件:myjoblist,里面写上要执行的任务,先简单些12行helloworld做测试:

echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"
echo "hello, world"

编写提交脚本

我们再编写一个提交脚本sub.sh,里面写上launcher的相关命令:

#!/bin/bash
export LAUNCHER_JOB_FILE=/path/to/myjoblist
export LAUNCHER_DIR=$HOME/launcher/launcher-3.1.1
export PATH=$LAUNCHER_DIR:$PATH
export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins
export LAUNCHER_RMI=SLURM
export LAUNCHER_SCHED=interleaved
export LAUNCHER_WORKDIR=`pwd`
$LAUNCHER_DIR/paramrun

说明:
1. LAUNCHER_JOB_FILE 为myjoblist路径,请改为实际路径
2. LAUNCHER_DIR 为launcher的安装路径,请改为实际路径
3. 其他的变量暂时不需要修改

提交脚本

yhbatch -N 2 -n 6 -p debug sub.sh

说明:
1. -N 2 表示2个节点
2. -n 6 表示6个cpu核(一共6个,不是每个节点6个;另外,注意n需要能被N整除,否则报错)
3. -p debug 表示使用debug分区

查看结果

使用slurm作业调度系统提交的任务会有一个默认的输出文件slurm-jobid.out,我们查看这个文件:

Launcher: Setup complete.

------------- SUMMARY ---------------
   Number of hosts:    2
   Working directory:  $HOME/workdir/test
   Processes per host: 3
   Total processes:    6
   Total jobs:         12
   Scheduling method:  interleaved

-------------------------------------
Launcher: Starting parallel tasks...
Launcher: Task 1 running job 2 on cn95 (echo "hello, world")
Launcher: Task 0 running job 1 on cn95 (echo "hello, world")
hello, world
hello, world
Launcher: Task 2 running job 3 on cn95 (echo "hello, world")
hello, world
Launcher: Job 1 completed in 0 seconds.
Launcher: Task 5 running job 6 on cn96 (echo "hello, world")
Launcher: Task 4 running job 5 on cn96 (echo "hello, world")
hello, world
hello, world
Launcher: Task 3 running job 4 on cn96 (echo "hello, world")
Launcher: Job 3 completed in 0 seconds.
hello, world
Launcher: Job 2 completed in 0 seconds.
Launcher: Job 6 completed in 0 seconds.
Launcher: Job 5 completed in 0 seconds.
Launcher: Job 4 completed in 0 seconds.
Launcher: Task 0 running job 7 on cn95 (echo "hello, world")
hello, world
Launcher: Task 2 running job 9 on cn95 (echo "hello, world")
hello, world
Launcher: Task 1 running job 8 on cn95 (echo "hello, world")
hello, world
Launcher: Task 5 running job 12 on cn96 (echo "hello, world")
hello, world
Launcher: Task 3 running job 10 on cn96 (echo "hello, world")
hello, world
Launcher: Task 4 running job 11 on cn96 (echo "hello, world")
hello, world
Launcher: Job 7 completed in 0 seconds.
Launcher: Job 9 completed in 0 seconds.
Launcher: Job 8 completed in 0 seconds.
Launcher: Job 12 completed in 0 seconds.
Launcher: Job 10 completed in 0 seconds.
Launcher: Job 11 completed in 0 seconds.
Launcher: Task 0 done. Exiting.
Launcher: Task 2 done. Exiting.
Launcher: Task 1 done. Exiting.
Launcher: Task 5 done. Exiting.
Launcher: Task 3 done. Exiting.
Launcher: Task 4 done. Exiting.
Launcher: Done. Job exited without errors

说明:

参数说明
Number of hosts2-N 2,所以为2个节点
Working directory$HOME/workdir/test这个是实际的提交目录
Processes per host3每个节点的进程数,是通过 6/2=3 得到,所以注意要整除 !
Total processes6-n 6,所以有一共6个进程
Total jobs12在myjobslist中我们写了12行,所以是12个jobs
Scheduling methodinterleaved这个参数是调度方法,有3种,详见官网

记录

  1. 在测试的时候,默认使用LAUNCHER_SCHED=dynamic会一直计算无法结束,暂时不考虑。
  2. 对openmp程序的支持?mpi程序呢?待测试。(看时间吧)
  3. 如果出现缺少库的情况,请将缺少的库添加到LD_LIBRARY_PATH中即可。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值