HP MC/ServiceGuard双机配置修改

最新推荐文章于 2020-04-27 21:34:41 发布

connie1021

最新推荐文章于 2020-04-27 21:34:41 发布

阅读量267

点赞数 1

1、1、 错误描述

LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000043802000), from raw device 0x1f050100 (with priority: 0, and current flags: 0x40) to raw device 0x1f070100 (with priority: 1, and current flags: 0x0).

LVM: VG 64 0x020000: PVLink 31 0x050100 Failed! The PV is still accessible.

LVM: Performed a switch for Lun ID = 0 (pv = 0x000000004380e000), from raw device 0x1f050200 (with priority: 0, and current flags: 0x40) to raw device 0x1f070200 (with priority: 1, and current flags: 0x0).

LVM: VG 64 0x030000: PVLink 31 0x050200 Failed! The PV is still accessible.

LVM: VG 64 0x040000: Lost quorum.

This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become available:

<31 0x070300>

LVM: VG 64 0x040000: PVLink 31 0x070300 Failed! The PV is not accessible.

LVM: VG 64 0x020000: Lost quorum.

This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become available:

<31 0x070100>

LVM: VG 64 0x020000: PVLink 31 0x070100 Failed! The PV is not accessible.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 8192,

blkno: 2566056, sectno: 5132112, offset: 2627641344, bcount: 8192.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 2048,

blkno: 2098216, sectno: 4196432, offset: 2148573184, bcount: 2048.

SCSI: Write error -- dev: b 31 0x070100, errno: 126, resid: 16384,

blkno: 1639328, sectno: 3278656, offset: 1678671872, bcount: 16384.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 4096,

blkno: 1598692, sectno: 3197384, offset: 1637060608, bcount: 4096.

SCSI: Write error -- dev: b 31 0x070100, errno: 126, resid: 2048,

blkno: 1175672, sectno: 2351344, offset: 1203888128, bcount: 2048.

SCSI: Write error -- dev: b 31 0x070100, errno: 126, resid: 4096,

blkno: 854688, sectno: 1709376, offset: 875200512, bcount: 4096.

SCSI: Write error -- dev: b 31 0x070100, errno: 126, resid: 3072,

blkno: 830056, sectno: 1660112, offset: 849977344, bcount: 3072.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 8192,

blkno: 499248, sectno: 998496, offset: 511229952, bcount: 8192.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 8192,

blkno: 499256, sectno: 998512, offset: 511238144, bcount: 8192.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 4096,

blkno: 293776, sectno: 587552, offset: 300826624, bcount: 4096.

SCSI: Async write error -- dev: b 31 0x070100, errno: 126, resid: 4096,

blkno: 293792, sectno: 587584, offset: 300843008, bcount: 4096.

LVM: VG 64 0x030000: Lost quorum.

This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become available:

<31 0x070200>

LVM: VG 64 0x030000: PVLink 31 0x070200 Failed! The PV is not accessible.

LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000048238000), from raw device 0x1f070300 (with priority: 1, and current flags: 0x80) to raw device 0x1f050300 (with priority: 0, and current flags: 0x80).

LVM: VG 64 0x040000: Reestablished quorum.

LVM: VG 64 0x040000: PVLink 31 0x050300 Recovered.

LVM: VG 64 0x040000: PVLink 31 0x070300 Recovered.

LVM: VG 64 0x030000: Reestablished quorum.

LVM: VG 64 0x030000: PVLink 31 0x050200 Recovered.

LVM: VG 64 0x030000: PVLink 31 0x070200 Recovered.

LVM: VG 64 0x020000: Reestablished quorum.

LVM: VG 64 0x020000: PVLink 31 0x050100 Recovered.

LVM: VG 64 0x020000: PVLink 31 0x070100 Recovered.

scp4[/var/opt/resmon/log]#tail -20000 /var/adm/syslog/syslog.log | grep -i war

Jun 2 08:17:03 scp4 cmcld: Warning: cmcld process was unable to run for the last 2 seconds,

Jun 2 17:49:03 scp4 cmcld: Warning: cmcld process was unable to run for the last 2 seconds,

Jun 2 21:41:02 scp4 cmcld: Warning: cmcld process was unable to run for the last 2 seconds,

2、2、 原因分析

双机中的两个节点有一个心跳时间，即双机配置文件cmcluster.asc中NODE_TIMEOUT，目前这个值为2000000 microseconds，也就是2秒。如果两个节点cmcld进程在2秒中不能正常进行通讯，就会在syslog.log中有如上的错误信息。

3、3、 存在风险

NODE_TIMEOUT 值太小的风险是：两个节点出现重组的可能性加大，即一台机器可能重启。双机中的两个节点在有限的时间内（2 次 timeout NODE_TIMEOUT + HEARTBEAT_INTERVAL), 如果发现两个节点cmcld进程不能正常进行通讯，双机中的两个节点会进行重组，导致一台机器panic，另一台机器接管所有的资源。目前系统并没有真正进行两个节点的重组，由于系统在有限的时间内两个节点cmcld进程仍能正常通讯。增加NODE_TIMEOUT值后，可以有效来避免双机节点出现重组的可能性。因此，建议修改该值，建议值为8秒（8000000 microseconds）

4、实施步骤

方案一：修改配置文件后并重启双机

1、修改配置文件cmcluster.asc，将NODE_TIMEOUT的值该为8000000

#cd /etc/cmcluster

#vi cmcluster.asc

2、检查配置文件

#cmcheckconf -C cmcluster.asc

3、停止双机

#cmhaltcl –f –v

4、应用配置文件

#cmapplyconf -C cmcluster.asc

5、启动双机

#cmruncl –v

如果条件允许可以更新如下双机软件patch

Download PHSS_32656 (15975K bytes)

安装步骤如下：

1. 安装前先备份系统

2. 以 root用户登陆

3. 把patch放到 /tmp

4. cd /tmp

sh PHSS_34391

5. Run swinstall to install the patch:

swinstall -x autoreboot=true -x patch_match_target=true -s /tmp/PHSS_34391.depot

如果有depen patch，那么选择tar格式下载，将tar包ftp到服务器上执行：

# cd /tmp/patch

# tar xvf pathname.tar

# ./create_depot_hpux.11.31

# swinstall -s /tmp/patch/depot

现网更新patch有风险，建议先实施修改双机配置。

[@more@]

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/9479798/viewspace-1050069/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/9479798/viewspace-1050069/

connie1021

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
HP MC/ServiceGuard双机配置修改

1、1、错误描述LVM: Performed a switch for Lun ID = 0 (pv = 0x0000000043802000), from raw device 0x1f050100 (with priori...
复制链接

扫一扫