Rocks cluster笔记——Rocks安装的一些常见问题

1.永久关闭防火墙: rocks run host   "chkconfig iptables off"
2.增加环境变量: 全局变量 加入到  /etc/profile
          当前用户变量加入到  ~/.bashrc
date -s 20071215
date -s 15:35 
在执行    clock -w


4. ssh 其他节点时:
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
解决办法:修改 /etc/ssh/ssh_config 文件,在最后加入 
ForwardX11Trusted yes (加入各个节点,并将头结点的秘钥文件拷过来)
scp /root/.ssh/*  compute:/root/.ssh/
然后退出执行 rocks sync config 
  rocks run host '/boot/kickstart/cluster-kickstart'
5. How do I remove a compute node from the cluster?
On your frontend end, execute:
# rocks remove host "[your compute node name]"
For example, if the compute node’s name is compute-0-1, you’d execute
# rocks remove host compute-0-1
# rocks sync config
The compute node has been removed from the cluster.
6. How do I export a new directory from the frontend to all the compute nodes that is accessible under /home?
Execute this procedure:
• Add the directory you want to export to the file /etc/exports.
For example, if you want to export the directory /export/disk1, add the following to /etc/exports:
• Restart NFS:
# /etc/rc.d/init.d/nfs restart
• Add an entry to /etc/auto.home.
For example, say you want /export/disk1 on the frontend machine (named frontend-0) to be mounted as
/home/scratch on each compute node.
Add the following entry to /etc/auto.home:
scratch frontend-0:/export/disk1
• Inform 411 of the change:
          make -C /var/411
Now when you login to any compute node and change your directory to /home/scratch, it will be automounted.
7. 注意:在每次运行完rocks的一些命令修改了数据库配置信息后,比如删除compute节点机,都要再运行:  rocks sync config
8 VASP 任务提交
1) (周健)名称:
#$ -cwd
#$ -j y
#$ -S /bin/bash
mpirun -r ssh -f $TMPDIR/machines -n $NSLOTS /home/software/vasp/vasp
Entries which start with #$ will be treated as SGE options.
• -cwd  means to execute the job for the current working directory.
• -j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams.
• -S /bin/bash specifies the interpreting shell for this job to be the Bash shell.
     -np $NSLOTS 表明使用多少个处理器核心进行计算,后面跟着计算软件路径。
提交时: qsub -pe mpich 4
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpich 16
(可加 expor=$PATH:路径)
mpirun -r ssh -f $TMPDIR/machines -n $NSLOTS /home/software/vasp/vasp
$MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMP/machines  ./cpi

#$ -pe mpich 16   指定脚本的并行环境为mpich,同时申请了16个处理器核心来进行运算。其它
提交时: qsub (或 ./
4)执行 qstat 查看作业执行状态
 说明,作业执行状态 qw 作业处于等待状态,r 运行状态。Slots 显示的是当前作业时
9 软件安装
修改组名: group -n  新组名 旧组名
修改用户属组: usermod -g  组名 用户名
               Usermod  -l  新用户名 旧用户名
             Usermod  -d    登录目录  用户名
             Userdel   -r   用户名
          Groupadd  cluster
10  添加用户
 (当不存在 cluster组时)
 Adduser  -g root   mu
Adduser  -g root  soft
  Passwd mu
            Rocks sync users
            make -C /var/411/   force
         Rocks sync config
    默认情况下,新建用户mu建立/export/home/mu目录,此目录是被其他计算节点共享的,对应/home/mu (包括头节点,软件可装在/export/home/mu/soft/下)。
2)  Root下建立用户 softe   useradd soft
3) Root下删除其密码  passwd  -d  soft
   Chmod a+rwx /export/home/soft
   同步账户  rocks sync users
   发布密码的信息  make -C /var/411 force
2) 使用XFTP 将程序考到soft 下
   使用root用户copy /export/home/soft/src 下
然后更改属主 chown -v soft:soft 文件名或目录
3)  rocks run host compute-0-0 command="hostname"
          rocks run host n  "reboot"
Run the command ’ls /tmp/’ on all n nodes.

11. ERROR: unable to send message to qmaster using port 536 on host "cluster.local": got send error
Luca Clementi luca.clementi at 
Wed Sep 12 19:34:35 PDT 2012
Previous message: [Rocks-Discuss] ERROR: unable to send message to qmaster using port 536 on host "cluster.local": got send error
Next message: [Rocks-Discuss] ERROR: unable to send message to qmaster using port 536 on host "cluster.local": got send error
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Sep 12, 2012 at 4:42 PM, 杨燚 <yang_yi at> wrote:
> I just delete some users and stop some services. and then the rocks sync
> config doesn’t work any more
> [root at cluster ~]# rocks sync config
> error: commlib error: got select error (Connection refused)
> ERROR: unable to send message to qmaster using port 536 on host
> "cluster.local": got send error

I would think it's an sge problem.
Can you restart it from the init script?
/etc/init.d/sgemaster.zhaoming start
/etc/init.d/sgeexecd.sten start

12.Problems with X11 forwarding and qlogin
Anoop Rajendra anoop.rajendra at 
Fri Oct 9 17:19:37 PDT 2009
Previous message: [Rocks-Discuss] Problems with X11 forwarding and qlogin
Next message: [Rocks-Discuss] Problems with X11 forwarding and qlogin
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On your frontend, add the line
ForwardX11Trusted       yes
to your /etc/ssh/ssh_config
and let us know if that solves your problem.


13 给所有节点安装软件

见rocks cluster6.2版手册5.1. Adding Packages to Compute Nodes
在主节点frontend上yum装完软件后,安装包在/var/cache/yum下,将该目录下所有安装包拷到/export/rocks/install/contrib/6.2/arch/RPMS 下,然后按如下操作,便可将将主节点安装的软件安装到所有子节点
Put the package you want to add in:
Where arch is your architecture ("i386" or "x86_64").
Create a new XML configuration file that will extend the current compute.xml configuration file:
# cd /export/rocks/install/site-profiles/6.2/nodes
# cp skeleton.xml extend-compute.xml
If you use extend-compute.xml your packages will be installed only on your computed nodes. If you
want your packages to be installed on all other appliances (e.g. login nodes, nas nodes, etc.) you should use
extend-base.xml instead of extend-compute.xml.
Inside extend-compute.xml, add the package name by changing the section from:
<package> <!-- insert your package name here --> </package>
<package> your package </package>

<package>rsh-server </package>

It is important that you enter the base name of the package in extend-compute.xml and not the full
For example, if the package you are adding is named XFree86-100dpi-fonts-4.2.0-6.47.i386.rpm, input
XFree86-100dpi-fonts as the package name in extend-compute.xml.
If you have multiple packages you’d like to add, you’ll need a separate <package> tag for each. For example, to
add both the 100 and 75 dpi fonts, the following lines should be in extend-compute.xml:
Also, make sure that you remove any package lines which do not have a package in them. For example, the file
should NOT contain any lines such as:
Chapter 5. Customizing your Rocks Installation
<package> <!-- insert your package name here --> </package>
Now build a new Rocks distribution. This will bind the new package into a RedHat compatible distribution in the
directory /export/rocks/install/rocks-dist/....
# cd /export/rocks/install
# rocks create distro
Now, reinstall your compute nodes.

14 网络重装reinstall所有计算节点
After your frontend completes its installation, the last step is to force a re-installation of all of your compute
nodes. The following will force a PXE (network install) reboot of all your compute nodes.
# ssh-agent $SHELL
# ssh-add
# rocks run host compute ’/boot/kickstart/cluster-kickstart-pxe’

15.  [Rocks-Discuss] installing rsh in rocks cluster 5.3
Go Yoshimura go-yoshimura at 
Mon May 17 23:41:14 PDT 2010
Previous message: [Rocks-Discuss] installing rsh in rocks cluster 5.3
Next message: [Rocks-Discuss] SGE Not reporting CPU Cores Correctly
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Leo!
- We are not sure about base-rsh.xml but you can create it by hand.
- Perhaps, /export/rocks/install/rocks-dist/x86_64/build/nodes/rsh.xml may be the answer(I'm not sure).
[root at panrocks53 nodes]# cat /export/rocks/install/rocks-dist/x86_64/build/nodes/rsh.xml  | grep package 
- We usually install rsh-server,telnet-server, vsftpd  with specifying like
<package>rsh-server </package>
<package>telnet-server </package>
<package>vsftpd </package>
  in a node file.
- About node file and graph file, is helpful.
- We pick up RPMs from CentOS5.4 iso file.
thank you
Leo P. wrote:
>Hi everyone,
>I am trying to install rsh in rocks cluster 5.3. I tried using the old way specified here 
>But i can find the base-rsh.xml and RPM in the repository. 
>So can anyone please tell me how i can install rsh in rocks cluster 5.3. 
>I need rsh to run an old software and can not use ssh instead :)
>-------------- next part --------------
>An HTML attachment was scrubbed...

Go Yoshimura <go-yoshimura at>
Scalable Systems Co., Ltd.  <>
Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
              Tel: 81-6-6224-4115
Tokyo Kojimachi Office  BUREX Kojimachi 11F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan 
              Tel: 81-3-5875-4718 Fax: 81-3-3237-7612  

16. 关于分区
export 链接到 硬盘剩余空间
share目录下新建链接apps文件夹 ,链接到 export下的apps文件夹

17. 制作frontend的iso文件并升级节点
见rockscluster6.2手册3.4. Upgrade or Reconfigure Your Existing Frontend
This procedure describes how to use a Restore Roll to upgrade or reconfigure your existing Rocks cluster.
Let’s create a Restore Roll for your frontend. This roll will contain site-specific info that will be used to quickly
reconfigure your frontend (see the section below for details).
# cd /export/site-roll/rocks/src/roll/restore
# make roll
The above command will output a roll ISO image that has the name of the form:
hostname-restore-date-0.arch.disk1.iso. For example, on the i386-based frontend with the FQDN of, the roll will be named like:
Burn your restore roll ISO image to a CD.
Reinstall the frontend by putting the Rocks Boot CD in the CD tray (generally, this is the Kernel/Boot Roll) and
reboot the frontend.
Chapter 3. Installing a Rocks Cluster
At the boot: prompt type:
At this point, the installation follows the same steps as a normal frontend installation (See the section: Install
Frontend) -- with two exceptions:
1. On the first user-input screen (the screen that asks for ’local’ and ’network’ rolls), be sure to supply the
Restore Roll that you just created.
2. You will be forced to manually partition your frontend’s root disk.
You must reformat your / partition, your /var partition and your /boot partition (if it exists).
Also, be sure to assign the mountpoint of /export to the partition that contains the users’ home areas.
Do NOT erase or format this partition, or you will lose the user home directories. Generally, this is the
largest partition on the first disk.
After your frontend completes its installation, the last step is to force a re-installation of all of your compute
nodes. The following will force a PXE (network install) reboot of all your compute nodes.
# ssh-agent $SHELL
# ssh-add
# rocks run host compute ’/boot/kickstart/cluster-kickstart-pxe’





