参考资料:
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=667703
http://www.qingsword.com/qing/1086.html
今天使用cman来管理集群,在启动时报错,错误信息如下:
[root@CentOS____102
~]
# service cman start
Starting cluster :
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... I /O warning : failed to load external entity "/etc/cluster/cluster.conf"
Unable to get the configuration
I /O warning : failed to load external entity "/etc/cluster/cluster.conf"
corosync [MAIN ] Corosync Cluster Engine ( '1.4.1') : started and ready to provide service.
corosync [MAIN ] Corosync built - in features : nss dbus rdma snmp
corosync [MAIN ] Unable to read config from /etc /cluster /cluster.conf
corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c : 1695.
corosync died : Could not read cluster configuration Check cluster logs for details
[FAILED]
Stopping cluster :
Leaving fence domain... [ OK ]
Stopping gfs_controld... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
[root@CentOS____102 ~] #
Starting cluster :
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... I /O warning : failed to load external entity "/etc/cluster/cluster.conf"
Unable to get the configuration
I /O warning : failed to load external entity "/etc/cluster/cluster.conf"
corosync [MAIN ] Corosync Cluster Engine ( '1.4.1') : started and ready to provide service.
corosync [MAIN ] Corosync built - in features : nss dbus rdma snmp
corosync [MAIN ] Unable to read config from /etc /cluster /cluster.conf
corosync [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c : 1695.
corosync died : Could not read cluster configuration Check cluster logs for details
[FAILED]
Stopping cluster :
Leaving fence domain... [ OK ]
Stopping gfs_controld... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
[root@CentOS____102 ~] #
产生问题的原因是corosync无法读取配置文件/etc/cluster/cluster.conf,文件属性如下所示:
[root@CentOS____102
~]
# ls -l /etc/cluster/cluster.conf
-rw -r --r --. 1 root root 995 Oct 30 16 : 30 /etc /cluster /cluster.conf
-rw -r --r --. 1 root root 995 Oct 30 16 : 30 /etc /cluster /cluster.conf
这个文件权限在机器A(192.168.56.101)是没有问题的,出问题的是机器B(192.168.56.102)。
机器B上的文件是使用scp命令从机器A上先拷贝到机器B的/root目录下,然后使用mv命令移到/etc/cluster目录的。之所以出现这样的问题,是因为在机器B上文件是/root目录下创建的,在使用mv命令移动这个文件的时候会加上admin_home_t label,如下所示:
[root@CentOS____102
~]
# ls -Z /etc/cluster/cluster.conf
-rw -r --r --. root root unconfined_u :object_r :admin_home_t :s0 /etc /cluster /cluster.conf
-rw -r --r --. root root unconfined_u :object_r :admin_home_t :s0 /etc /cluster /cluster.conf
因为这个label的存在,SELinux会阻止corosync读取这个文件,admin_home_t表示这个文件是root用户目录下的文件。之所以打不开是因为cluster.conf文件的SELinux配置信息是继承原来那个目录的,与/etc/cluster目录不同。查看SELinux的日志信息(日志文件位置为/var/log/audit/audit.log),可以看到下面的内容:
22109 type
=AVC msg
=audit(
1383121897.
046
:
28821)
: avc
: denied { getattr }
for pid
=
2645 comm
=
"corosync" path
=
"/etc/cluster/cluster.conf" dev
=sda2 ino
=
1311842 scontext
=unconfined_u
:system_r
:corosync_t
:s0 tcontext
=unconfined_u
:object_r
:admin_home_t
:s0 tclass
=file
正确的SELinux配置信息是scontext对应的内容,它描述了corosync进程的SELinux上下文,其中unconfined_u表示的是SELinux用户,这里的就是没有限制特定的用户。system_r表示corosync是一个进程,corosync_t是它的类型(或者叫域),每个进程或文件都有一个类型。cluster.conf的SELinux上下文由tcontext描述,第一个字段表示没有限制特定的用户,第二个字段表示她是一个目录或文件,第三个字段则是它的类型,这个类型只有root用户才可以访问。
现在知道了问题原因,要解决这个问题见就很简单了。最容易想到的方法就是关闭SELinux。如果不想关闭SELinux,可以使用下面的命令恢复原来的文件标签,如下所示:
[root@CentOS____102
~]
# restorecon -v /etc/cluster/cluster.conf
restorecon reset /etc /cluster /cluster.conf context unconfined_u :object_r :admin_home_t :s0 - >unconfined_u :object_r :cluster_conf_t :s0
restorecon reset /etc /cluster /cluster.conf context unconfined_u :object_r :admin_home_t :s0 - >unconfined_u :object_r :cluster_conf_t :s0
恢复后再使用ls -Z命令查看文件的信息,和机器A上的一样。修改后重新启动cman服务正常。