【SGE】任务显示 T 状态,qstat -j 报错 can not find an unused add_grp_id

同事新装的 SGE 在提交任务后,只有一个job可以运行, 多个任务显示在 T 状态, 如下:

# qstat -u '*'
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
     55 0.60500 I_ZC170000 robots       r     02/23/2017 11:27:45 all.q@###                     6        
     56 0.50500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@###                     1        
     57 0.50500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@###                    1        
     58 0.60500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@###                     6        
     59 0.50500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@###                    1        
     60 0.60500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@Analysis                     6        
     61 0.50500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@Analysis                     1        
     63 0.60500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@Analysis                     6        
     65 0.60500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@Analysis                     6        
     68 0.60500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@Analysis                     6        
     70 0.60500 II_ZC17000 robots       t     02/23/2017 11:29:45 all.q@Analysis                     6 

使用 命令 qstat -j 56 查看任务信息有如下报错:

error reason    1:          can not find an unused add_grp_id
                1:          can not find an unused add_grp_id
                1:          can not find an unused add_grp_id
                1:          can not find an unused add_grp_id

解决办法

导致该问题的原因是因为 SGE 的环境配置中 gid_range 大小不够用造成, 如:

[root@Analysis gridengine]# qconf -sconf | grep gid_range
gid_range                    21000

上述的配置应该是一个区间, 而被同事错误的配置成了一个数字, 所以只有一个job可以正常执行。将该值改为区间即可,再重启下sgemaster 即可。

[root@Analysis gridengine]# qconf -sconf | grep gid_range
gid_range                    20000-21000

man 一下:

[root@Analysis ~]# man sge_conf
.........
gid_range
       The gid_range is a comma separated list of  range  expressions  of  the
       form  n-m (n as well as m are integer numbers greater than 99), where m
       is an abbreviation for m-m. These numbers are used in  sge_execd(8)  to
       identify processes belonging to the same job.

       Each sge_execd(8) may use a separate set up group ids for this purpose.
       All number in the group id range have to be unused supplementary  group
       ids on the system, where the sge_execd(8) is started.

       Changing gid_range will take immediate effect.  There is no default for
       gid_range. The administrator will have to assign a value for  gid_range
       during installation of Sun Grid Engine.

       The global configuration entry for this value may be overwritten by the
       execution host local configuration.

参考资料: http://arc.liv.ac.uk/pipermail/gridengine-users/2005-September/007056.html

转载于:https://my.oschina.net/leopardlinux/blog/844500

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值