<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

 
1)         环境介绍

OS redhat enterprise Linux 4.6 x86

Cluster:RHCS 2 nodes

多路径软件: emc powerpath 5.1 for linux

Storage:EMC AX4-5   EMC CX300

Ax4-5 有一个 LUN 映射给主机, CX300 有两个 LUN 映射给主机

2)         故障描述

在磁阵上配置好 LUN 映射后,先后重新两节点服务器。两节点都认到所映射存储单元( LUN )。运行 fdisk –l 查看 LUN 在主机( OS )看到的设备名。发现两节点认到的设备名不一致。其中, node1 认到 emcpowera emcpowerc emcpowerd node2 认到 emcpowera emcpowerb emcpowerc ;根据所划分空间的大小,可知其中 node1 emcpowera对应node2 emcpoweranode1 emcpowerc对应node2 emcpowerbnode1 emcpowerd对应node2 emcpowerc

由于两节点要做 cluster ,在群集中配置共享存储时,要求两节点对识别到的 LUN 要有相同的设备名。

3)         分析排错

node2 识别到的盘符是对的; node1 有问题,不知道为何把 emcpowerb 搞没了。

node1 上执行 powermt display dev=all

emcpadm getfreepseudos –n 5 发现 node1 emcpowerb 并在列表中。

由于业务系统上线在即,没有更多的时间去考虑和分析。当时想到两种思路,一是删除 node1 上识别到的路径,重启机器看看是否能解决;二是,将 node2 的盘符手动修改为和 node1 一样。

排错思路一操作:

powermt remove dev=all // 删除当前认到的路径

powermt config  // 路径重认

powermt display dev=all

reboot

问题依然存在,没有得到解决;

排错思路二操作:

node2 上操作

emcpadm getfreepseudos // 发现 emcpowerd 可用;

emcpadm –s emcpowerc –t emcpowerd

emcpadm –s emcpowerb –t emcpowerc

powermt save

Reboot

至此,两节点都认到 emcpowera,emcpowerc,emcpowerd ,问题解决。

 

4)         结论

由于 node1 之前做测试时,曾有 emcpowerb 存在过,在移走该设备后, powerpath 配置数据库未能及时更新。导致 emcpowerb 表现为占用。

后续我找了相关的文章,发现通过强制删除 powerpath 配置的文件方式尝试进行解决。操作步骤如下:

停止powerpath服务

/etc/init.d/PowerPath stop  

保存当前配置文件的备份

# cp /etc/powermt.custom /etc/powermt.custom.old_config

# cp /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.dat.old_config

# cp /etc/emcp_devicesDB.idx /etc/emcp_devicesDB.idx.old_config

删除powerpath相关配置文件

 # rm /etc/powermt.custom /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.idx

重启powerpath服务

# /etc/init.d/PowerPath start

保持powerpath配置

# powermt save

5)         参考

root cause 1

In some cases, during installation of PowerPath and device reconfiguration, a server may skip a few "emcpower" devices due to devices that were removed.  PowerPath keeps track of devices and makes sure that the emcpower device names remains the same regardless of the underlying Linux /dev/sd# device.

Fix: steps for powerpath 4.x

1) Make sure all I/O is stopped and all of the file systems to the array are unmounted.

2) Stop PowerPath

# /etc/init.d/PowerPath stop

3) Make a backup copy of the current PowerPath custom file just in case

# cp /etc/powermt.custom /etc/powermt.custom.old_config

4) Make a backup copy of the current PowerPath config dat file...just in case

# cp /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.dat.old_config

5) Make a backup copy of the current PowerPath config idx file...just in case

# cp /etc/emcp_devicesDB.idx /etc/emcp_devicesDB.idx.old_config

6) Remove the old config files # rm /etc/powermt.custom /etc/emcp_devicesDB.dat /etc/emcp_devicesDB.idx

7) Remove the /etc/emc/archive directory.

# rm –r /etc/emc/arvhive

8) Start PowerPath

# /etc/init.d/PowerPath start

9) Save the new configuration

# powermt save

In some cases with PowerPath 4.x this process will clean up the PowerPath devices but they still will not be discovered in Bus-Target-LUN order so if you are trying to synchronize emcpower device numbers between two cluster nodes it may not work.  In this case it is recommended that you present the devices to the node one at a time in the order you want them to appear.

root cause 2

Devices were not added to the nodes in the same order

Fix: steps for powerpath 4.x

       Use the emcpadm command to change the emcpower pseudo devices to the desired names.

In order to "fix" the discrepancy between the two nodes the emcpadm command can be used.

1 Use the command below in order to determine the emcpower devices that are already in use

# emcpadm getused

PowerPath pseudo device names in use:

        Pseudo Device Name      Major# Minor#

                emcpowera         232      0

                emcpowerb         232     16

                emcpowerc         232     32

                emcpowerd         232     48

                emcpowere         232     64

                emcpowerg         232     96

2 Use the command below in order to determine the emcpower devices that are available

# emcpadm getfree -n 5 -b emcpowera

PowerPath pseudo device names not in use:

        Pseudo Device Name      Major# Minor#

                emcpowerf         232     80

                emcpowerh         232    112

                emcpoweri         232    128

                emcpowerj         232    144

                emcpowerk         232    160

3 Use the command below to rename a device

# emcpadm rename -s emcpowerg -t emcpowerf  
4 The "emcpadm getused" command can now be used again to check the devices after the rename

# emcpadm getused

PowerPath pseudo device names in use:

        Pseudo Device Name      Major# Minor#

                emcpowera         232      0

                emcpowerb         232     16

                emcpowerc         232     32

                emcpowerd         232     48

                emcpowere         232     64

                emcpowerf         232     80

5 Note In order to make sure that the actual volumes match between the two cluster nodes the "powermt display dev=all" command can be used from each node in the cluster for comparison.

# powermt display dev=all

Pseudo name=emcpowerf

CLARiiON ID=WRE00021500573 [Linux103]

Logical device ID=6006016022470A0084D8358B528BD911 [LUN 10]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP B, current=SP B

==============================================================================

---------------- Host ---------------   - Stor -   -- I/O Path -  -- Stats -

## HW Path                 I/O Paths    Interf.   Mode    State  Q-IOs errors

==============================================================================

  2 lpfc                      sdg        SP A0     active  alive      0      0

  3 lpfc                      sdm        SP B0     active  alive      0      0