环境描述:

iSCSI服务器:Windows Server 2008 R2sp1+Microsoft iSCSI Software Target 3.3

iSCSI客户端:RHEL6.4 x86_64+iscsi-initiator-utils-6.2.0.873-2.el6.x86_64


故障重现:

原先使用Windows 2008充当iSCSI服务器并映射了多块磁盘给RAC的两个节点使用:

wKioL1cMvcGTk5Y8AABKsAHF1Pw856.png

因想模拟替换RAC环境下的ASM磁盘,但在使用新的替换成功后禁用了旧的lun的映射,RAC节点重启后无法识别iSCSI的磁盘,所有RAC服务器均无法启动:

wKiom1cMvRny3_KoAABk0oAah5I330.png

重启前:

[root@rac1 ~]# fdisk -l

Disk /dev/sda: 32.2 GB, 32212254720 bytes

255 heads, 63 sectors/track, 3916 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000874ca


   Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26        3917    31251456   8e  Linux LVM


Disk /dev/mapper/vg00-lv_root: 23.4 GB, 23408410624 bytes

255 heads, 63 sectors/track, 2845 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000


Disk /dev/mapper/vg00-lv_swap: 8589 MB, 8589934592 bytes

255 heads, 63 sectors/track, 1044 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000


Disk /dev/sdi: 21.5 GB, 21474836480 bytes

64 heads, 32 sectors/track, 20480 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xee8ee349


   Device Boot      Start         End      Blocks   Id  System

/dev/sdi1               1       20480    20971504   83  Linux


Disk /dev/sdj: 21.5 GB, 21474836480 bytes

64 heads, 32 sectors/track, 20480 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x7785654b


   Device Boot      Start         End      Blocks   Id  System

/dev/sdj1               1       20480    20971504   83  Linux


Disk /dev/sdk: 32.2 GB, 32212254720 bytes

64 heads, 32 sectors/track, 30720 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xcae7e952


   Device Boot      Start         End      Blocks   Id  System

/dev/sdk1               1       30720    31457264   83  Linux


Disk /dev/sdl: 2147 MB, 2147483648 bytes

67 heads, 62 sectors/track, 1009 cylinders

Units = cylinders of 4154 * 512 = 2126848 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x6e15849d


   Device Boot      Start         End      Blocks   Id  System

/dev/sdl1               1        1009     2095662   83  Linux


Disk /dev/sdm: 2147 MB, 2147483648 bytes

67 heads, 62 sectors/track, 1009 cylinders

Units = cylinders of 4154 * 512 = 2126848 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xb137a7a4


   Device Boot      Start         End      Blocks   Id  System

/dev/sdm1               1        1009     2095662   83  Linux


Disk /dev/sdn: 2147 MB, 2147483648 bytes

67 heads, 62 sectors/track, 1009 cylinders

Units = cylinders of 4154 * 512 = 2126848 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xb523ed12


   Device Boot      Start         End      Blocks   Id  System

/dev/sdn1               1        1009     2095662   83  Linux


Disk /dev/sdo: 2147 MB, 2147483648 bytes

67 heads, 62 sectors/track, 1009 cylinders

Units = cylinders of 4154 * 512 = 2126848 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xcda3ffae


   Device Boot      Start         End      Blocks   Id  System

/dev/sdo1               1        1009     2095662   83  Linux


重启系统之后

[root@rac1 ~]# fdisk -l

Disk /dev/sda: 32.2 GB, 32212254720 bytes

255 heads, 63 sectors/track, 3916 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000874ca


   Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26        3917    31251456   8e  Linux LVM


Disk /dev/mapper/vg00-lv_root: 23.4 GB, 23408410624 bytes

255 heads, 63 sectors/track, 2845 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000



Disk /dev/mapper/vg00-lv_swap: 8589 MB, 8589934592 bytes

255 heads, 63 sectors/track, 1044 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000


重新登录iscsi target也一样发现不了磁盘

[root@rac1 ~]# iscsiadm -m discovery -t st -p 192.168.253.100

192.168.253.100:3260,1 iqn.1991-05.com.microsoft:dnsserver-rac1-target

[root@rac1 ~]# iscsiadm -m node -T iqn.1991-05.com.microsoft:dnsserver-rac1-target -l

[root@rac1 ~]# fdisk -l

Disk /dev/sda: 32.2 GB, 32212254720 bytes

255 heads, 63 sectors/track, 3916 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000874ca


   Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26        3917    31251456   8e  Linux LVM


Disk /dev/mapper/vg00-lv_root: 23.4 GB, 23408410624 bytes

255 heads, 63 sectors/track, 2845 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000


Disk /dev/mapper/vg00-lv_swap: 8589 MB, 8589934592 bytes

255 heads, 63 sectors/track, 1044 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000


但服务器端看到的状态又正常

wKiom1cMvUWx7ZVeAABEFVRzMXY083.png

wKioL1cMvfqikI0eAACcDmRaQPc057.png


在客户端看到的错误日志,但却没有更多的信息可供分析

[root@rac1 ~]# dmesg|tail

scsi33 : iSCSI Initiator over TCP/IP

scsi 33:0:0:0: Device offlined - not ready after error recovery

scsi34 : iSCSI Initiator over TCP/IP

scsi 34:0:0:0: Device offlined - not ready after error recovery


测试一:

禁用除lun0之外的其它磁盘(经测试只有保留lun0重启后才能识别到磁盘)

wKiom1cMvVyCYTO-AABn94kidtc740.png

重启客户端,除sdb之外所有磁盘标识均做了变动

[root@rac2 ~]# fdisk -l |grep "Disk /dev/sd*"

Disk /dev/sda: 32.2 GB, 32212254720 bytes

Disk /dev/sdb: 2147 MB, 2147483648 bytes

Disk /dev/sdc: 21.5 GB, 21474836480 bytes

Disk /dev/sdd: 21.5 GB, 21474836480 bytes

Disk /dev/sde: 32.2 GB, 32212254720 bytes

Disk /dev/sdf: 2147 MB, 2147483648 bytes

Disk /dev/sdg: 2147 MB, 2147483648 bytes

Disk /dev/sdh: 2147 MB, 2147483648 bytes


变动关系

sdi --> sdc

sdj --> sdd

sdk --> sde

sdl --> sdf

sdm --> sdg

sdn --> sdh



查看asm磁盘状态也是正常的,说明磁盘号变动并不影响asmlib对磁盘的标识

[root@rac2 ~]# oracleasm querydisk -d /dev/sdc1

Device "/dev/sdc1" is marked an ASM disk with the label "NEW_DATA01"

[root@rac2 ~]# oracleasm querydisk -d /dev/sdd1

Device "/dev/sdd1" is marked an ASM disk with the label "NEW_DATA02"

[root@rac2 ~]# oracleasm querydisk -d /dev/sde1

Device "/dev/sde1" is marked an ASM disk with the label "NEW_FRA01"

[root@rac2 ~]# oracleasm querydisk -d /dev/sdf1

Device "/dev/sdf1" is marked an ASM disk with the label "NEW_OCR01"

[root@rac2 ~]# oracleasm querydisk -d /dev/sdg1

Device "/dev/sdg1" is marked an ASM disk with the label "NEW_OCR02"

[root@rac2 ~]# oracleasm querydisk -d /dev/sdh1

Device "/dev/sdh1" is marked an ASM disk with the label "NEW_OCR03"


尝试清空sdb

dd if=/dev/zero of=/dev/sdb bs=1024k count=2048

取消LUN 0的映射后重启问题依旧


为了更好的查看iscsi映射过来的磁盘情况,安装lsscsi工具查看

[root@rac1 ~]# yum install -y lsscsi

[root@rac1 ~]# lsscsi -t

[0:0:0:0]    disk    spi:0                           /dev/sda 

[4:0:0:0]    cd/dvd  sata:                           /dev/sr0 

[33:0:0:0]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdb 

[33:0:0:7]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdc 

[33:0:0:8]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdd 

[33:0:0:9]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sde 

[33:0:0:10]  disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdf 

[33:0:0:11]  disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdg 

[33:0:0:12]  disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdh 

取消lun 0的映射

fdisk -l中sdb已不存在

lsscsi -t中iscsi映射过来的所有磁盘路径还是存在

使用iscsiadmin从target端logout后

[root@rac1 ~]# iscsiadm -m node -T iqn.1991-05.com.microsoft:dnsserver-rac1-target -u

lsscsi -t中所有映射过来的设备都不存在

[root@rac1 ~]# lsscsi -t

[0:0:0:0]    disk    spi:0                           /dev/sda 

[4:0:0:0]    cd/dvd  sata:                           /dev/sr0 

恢复lun 0的映射

使用iscsiadmin从target端login后,磁盘又恢复正常

[root@rac1 ~]# iscsiadm -m node -T iqn.1991-05.com.microsoft:dnsserver-rac1-target -l

[root@rac2 ~]# lsscsi -t

[0:0:0:0]    disk    spi:0                           /dev/sda 

[4:0:0:0]    cd/dvd  sata:                           /dev/sr0 

[34:0:0:0]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdb 

[34:0:0:7]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdc 

[34:0:0:8]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdd 

[34:0:0:9]   disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sde 

[34:0:0:10]  disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdf 

[34:0:0:11]  disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdg 

[34:0:0:12]  disk    iqn.1991-05.com.microsoft:dnsserver-rac2-target,t,0x1  /dev/sdh 


测试一结论:无法得知到底是什么原因,要么是Linux系统对sdb有什么特殊操作,要么是Windows iSCSI target的问题


测试二:

使用Linux iSCSI target utils充当服务器,再测试故障是否会重现,用以测试到底是Linux客户端的原因还是Windows iSCSI target服务器的原因

具体搭建方法请参考另外《iSCSI服务器以及客户端安装配置》的笔记,链接:http://childres.blog.51cto.com/11420270/1763069

启动tgtd服务后,查看当前的LUN情况,发现iSCSI target utils默认LUN 0是分给了控制器

[root@target ~]# tgtadm --mode target --op show

Target 1: iqn.2016-04.target:vdisk

    System information:

        Driver: iscsi

        State: ready

    I_T nexus information:

    LUN information:

        LUN: 0

            Type: controller

            SCSI ID: IET     00010000

            SCSI SN: beaf10

            Size: 0 MB, Block size: 1

            Online: Yes

            Removable media: No

            Prevent removal: No

            Readonly: No

            Backing store type: null

            Backing store path: None

            Backing store flags: 

        LUN: 1

            Type: disk

            SCSI ID: IET     00010001

            SCSI SN: beaf11

            Size: 21475 MB, Block size: 512

            Online: Yes

            Removable media: No

            Prevent removal: No

            Readonly: No

            Backing store type: rdwr

            Backing store path: /iscsi_data/data1.img

            Backing store flags: 

        LUN: 2

            Type: disk

            SCSI ID: IET     00010002

            SCSI SN: beaf12

            Size: 10737 MB, Block size: 512

            Online: Yes

            Removable media: No

            Prevent removal: No

            Readonly: No

            Backing store type: rdwr

            Backing store path: /iscsi_data/fra1.img

            Backing store flags: 

        LUN: 3

            Type: disk

            SCSI ID: IET     00010003

            SCSI SN: beaf13

            Size: 1074 MB, Block size: 512

            Online: Yes

            Removable media: No

            Prevent removal: No

            Readonly: No

            Backing store type: rdwr

            Backing store path: /iscsi_data/ocr1.img

            Backing store flags: 

        LUN: 4

            Type: disk

            SCSI ID: IET     00010004

            SCSI SN: beaf14

            Size: 1074 MB, Block size: 512

            Online: Yes

            Removable media: No

            Prevent removal: No

            Readonly: No

            Backing store type: rdwr

            Backing store path: /iscsi_data/ocr2.img

            Backing store flags: 

        LUN: 5

            Type: disk

            SCSI ID: IET     00010005

            SCSI SN: beaf15

            Size: 1074 MB, Block size: 512

            Online: Yes

            Removable media: No

            Prevent removal: No

            Readonly: No

            Backing store type: rdwr

            Backing store path: /iscsi_data/ocr3.img

            Backing store flags: 

    Account information:

    ACL information:

        ALL


且在客户端login后看到0:0:0是没有配置任何设备的,故也无法取消LUN 0的映射

[root@rac1 ~]# lsscsi -t

[0:0:0:0]    disk    spi:0                           /dev/sda 

[4:0:0:0]    cd/dvd  sata:                           /dev/sr0 

[34:0:0:0]   storage iqn.2016-04.target:vdisk,t,0x1  -       

[34:0:0:1]   disk    iqn.2016-04.target:vdisk,t,0x1  /dev/sdb 

[34:0:0:2]   disk    iqn.2016-04.target:vdisk,t,0x1  /dev/sdc 

[34:0:0:3]   disk    iqn.2016-04.target:vdisk,t,0x1  /dev/sdd 

[34:0:0:4]   disk    iqn.2016-04.target:vdisk,t,0x1  /dev/sde 

[34:0:0:5]   disk    iqn.2016-04.target:vdisk,t,0x1  /dev/sdf


测试二结论:Microsoft iSCSI Software Target的LUN 0中含有iSCSI控制器信息,取消LUN 0的映射也会影响到其它LUN,导致客户端无法正常识别


P.S.以上故障环境均是在测试虚拟机的平台上,不知道使用了Windows Storage Server系统的iSCSI存储(比如说HP StoreEasy序列)会不会也存在的这种问题呢?暂时没有这类设备,看看后期再下载Windows Storage Server 2012装在虚拟机上测试测试。