LSF使用技巧 :应用程序退出码含义?

LSF中应用程序退出码的说明

退出码

说明

0

应用程序运行过程中没有发生错误,正常结束。

1 ~ 125

应用程序退出码,需要查看应用程序手册确定退出码的含义。有些应用程序非零退出码也代表正常结束。

126

用户没有权限执行命令

127

没有找到要执行的命令

> 128

表示作业被信号中断,信号值为 退出码 - 128,需要在相应操作系统上查看对应信号的涵义。如退出码130, 130 - 128 = 2, 在Linux平台信号2表示SIGINT,即中断信号。

255

作业以 -1 退出

示例1:退出码255

编写C程序以-1退出, cat /tmp/calibre.c 


#include <stdio.h>
int main(void){
   printf("Hello world.\n");
   return(-1);
}

编译后在命令行运行,可见退出码为255

[lsfadmin@master tmp]$ gcc calibre.c -o calibre

[lsfadmin@master tmp]$ chmod +x calibre

[lsfadmin@master tmp]$ ./calibre

Hello world.

[lsfadmin@master tmp]$ echo $? 255

将此命令提交LSF执行

[lsfadmin@master ~]$ bsub -I calibre

Job <1349> is submitted to default queue <interactive>.

<<Waiting for dispatch ...>>

<<Starting on shugb>>

Hello world.

[lsfadmin@master ~]$ bjobs -UF 1349

Job <1349>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <interactive>, Interactive mode, Command <calibre>, Share group charged </lsfadmin>

Sat May 21 21:48:27: Submitted from host <master>, CWD <$HOME>;

Sat May 21 21:48:27: Started 1 Task(s) on Host(s) <shugb>, Allocated 1 Slot(s) on Host(s) <shugb>;

Sat May 21 21:48:32: Exited with exit code 255. The CPU time used is 0.0 seconds.

Sat May 21 21:48:32: Completed <exit>.

示例2: 退出码 127 找不到命令

提交一个不存在的命令到LSF执行:

[lsfadmin@master configdir]$ bsub -I pt_shell 
Job <1346> is submitted to default queue <interactive>. 
<<Waiting for dispatch ...>> 
<<Starting on shugb>> 
/opt/ibm/lsfsuite/lsf/10.1/linux2.6-glibc2.3-x86_64/etc/myjs.sh: line 17: pt_shell: command not found 
[lsfadmin@master configdir]$ bjobs -UF 1346 
Job <1346>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <interactive>, Interactive mode, Command <pt_shell>, Share group charged </lsfadmin> 
Sat May 21 21:38:06: Submitted from host <master>, CWD </opt/ibm/lsfsuite/lsf/conf/lsbatch/lsf-demo/configdir>; 
Sat May 21 21:38:06: Started 1 Task(s) on Host(s) <shugb>, Allocated 1 Slot(s) on Host(s) <shugb>; 
Sat May 21 21:38:11: Exited with exit code 127. The CPU time used is 0.0 seconds. 
Sat May 21 21:38:11: Completed <exit>.

示例3: 退出码126 没有访问权限

以用户帐号lsfadmin创建程序,并设置权限为仅自己可访问。

[lsfadmin@master /]$ ls -l /tmp/gen

-rwx------ 1 lsfadmin lsfadmin 34 Jul 4 18:06 /tmp/gen

[lsfadmin@openlava-master /]$ bsub -Ip -m master /tmp/gen

Job <208> is submitted to default queue <interactive>.

<<Waiting for dispatch ...>>

<<Starting on master>>

Hello World!

切换用户帐号shugb,提交以上命令到LSF中运行。

[shugb@master ~]$ bsub -Ip -m master /tmp/gen

Job <209> is submitted to default queue <interactive>.

<<Waiting for dispatch ...>>

<<Starting on master>>

/home/shugb/.lsbatch/1656929425.209: line 8: /tmp/gen: Permission denied

[shugb@master ~]$ bjobs -UF 209

Job <209>, User <shugb>, Project <default>, Status <EXIT>, Queue <interactive>, Interactive pseudo-terminal mode, Command </tmp/gen>, Share group charged </shugb>

Mon Jul 4 18:10:25: Submitted from host <master>, CWD <$HOME>, Specified Hosts <master>;

Mon Jul 4 18:10:25: Started 1 Task(s) on Host(s) <master>, Allocated 1 Slot(s) on Host(s) <master>;

Mon Jul 4 18:10:31: Exited with exit code 126. The CPU time used is 0.0 seconds.

Mon Jul 4 18:10:31: Completed <exit>.

示例4: 退出码130 程序被中断运行

提交作业到LSF执行

[shugb@master ~]$ bsub -m cmp1 sleep 1000

Job <210> is submitted to default queue <normal>.

在执行机上,中断程序执行

[root@cmp1 log]# ps -elf|grep sleep

0 S shuguan+ 97555 97553 0 80 0 - 27015 hrtime 18:17 ? 00:00:00 sleep 1000

0 S root 97572 1399 0 80 0 - 27014 hrtime 18:17 ? 00:00:00 sleep 60

0 S root 97578 94237 0 80 0 - 28204 pipe_w 18:17 pts/0 00:00:00 grep --color=auto sleep

[root@openlava-cmp1 log]# kill -2 97555

[root@openlava-cmp1 log]#

检查作业退出码

[shugb@master ~]$ bjobs -UF 210

Job <210>, User <shugb>, Project <default>, Status <EXIT>, Queue <normal>, Command <sleep 1000>, Share group charged </shugb> Mon Jul 4 18:17:19: Submitted from host <master>, CWD <$HOME>, Specified Hosts <cmp1>;

Mon Jul 4 18:17:19: Started 1 Task(s) on Host(s) <cmp1>, Allocated 1 Slot(s) on Host(s) <cmp1>, Execution Home </home/shugb>, Execution CWD </home/shugb>;

Mon Jul 4 18:17:56: Exited with exit code 130. The CPU time used is 0.1 seconds.

Mon Jul 4 18:17:56: Completed <exit>.

示例5: 作业被用户或管理员通过LSF命令终止

如果作业是用户或管理员能过LSF命令终止,在作业信息中除了有退出码外,还会有诸如 TERM_OWNER、TERM_ADMIN等提示

[shugb@master ~]$ bsub -m openlava-cmp1 sleep 1000

Job <211> is submitted to default queue <normal>.

[shugb@master ~]$ bkill 211

Job <211> is being terminated

[shugb@master ~]$ bjobs -UF 211

Job <211>, User <shugb>, Project <default>, Status <EXIT>, Queue <normal>, Command <sleep 1000>, Share group charged </shugb> Mon Jul 4 18:24:12: Submitted from host <master>, CWD <$HOME>, Specified Hosts <cmp1>;

Mon Jul 4 18:24:13: Started 1 Task(s) on Host(s) <cmp1>, Allocated 1 Slot(s) on Host(s) <cmp1>, Execution Home </home/shugb>, Execution CWD </home/shugb>;

Mon Jul 4 18:24:22: Exited with exit code 130. The CPU time used is 0.0 seconds.

Mon Jul 4 18:24:22: Completed <exit>; TERM_OWNER: job killed by owner.

以帐号shugb提交作业,然后管理员lsfadmin通过LSF命令bkill 终止作业,查看作业信息。

[shugb@master ~]$ bsub -m cmp1 sleep 1000

Job <212> is submitted to default queue <normal>.

[shugb@master ~]$ bjobs -UF 212 Job <212>, User <shugb>, Project <default>, Status <EXIT>, Queue <normal>, Command <sleep 1000>, Share group charged </shugb>

Mon Jul 4 18:27:42: Submitted from host <master>, CWD <$HOME>, Specified Hosts <cmp1>;

Mon Jul 4 18:27:43: Started 1 Task(s) on Host(s) <cmp1>, Allocated 1 Slot(s) on Host(s) <cmp1>, Execution Home </home/shugb>, Execution CWD </home/shugb>;

Mon Jul 4 18:28:30: Exited with exit code 130. The CPU time used is 0.1 seconds.

Mon Jul 4 18:28:30: Completed <exit>; TERM_ADMIN: job killed by root or an administrator.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值