终端1
[16:26:24 root@localhost modprobe.d]# dd if=/dev/zero f=/dev/sddlmaa1
233408833+0 records in
233408833+0 records out
119505322496 bytes (120 GB) copied, 701.094 s, 170 MB/s
终端2
[16:31:49 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000002
PathStatus IO-Count IO-Errors
Online 235218946 0
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 75625355 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 159593591 0 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:32:32
终端3
Linux 2.6.32-220.el6.x86_64 (localhost.localdomain) 08/05/2013 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.09 0.01 2.26 0.42 0.00 97.22
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.55 67.96 8.32 476638 58346
sdb 194.79 1.24 10897.81 8728 76431871
sdc 233.01 1.06 22918.89 7416 160741878
avg-cpu: %user %nice %system %iowait %steal %idle
0.87 0.00 22.01 4.24 0.00 72.88
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.20 0.00 2.40 0 24
sdb 1020.40 0.00 206470.20 0 2064702
sdc 966.10 0.00 138000.00 0 1380000
avg-cpu: %user %nice %system %iowait %steal %idle
0.90 0.00 21.15 4.78 0.00 73.17
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 978.30 0.00 149055.20 0 1490552
sdc 1011.50 0.00 189052.40 0 1890524
avg-cpu: %user %nice %system %iowait %steal %idle
0.90 0.00 21.14 4.08 0.00 73.88
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.20 156.00 1.60 1560 16
sdb 1033.80 0.00 152442.40 0 1524424
sdc 1070.40 0.00 190097.60 0 1900976
avg-cpu: %user %nice %system %iowait %steal %idle
0.84 0.00 22.66 3.86 0.00 72.65
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.30 17.58 4.00 176 40
sdb 233.97 0.00 46331.27 0 463776
sdc 680.22 0.00 296334.77 0 2966311
终端2:
关闭一个光纤卡
[16:32:32 root@localhost bin]# ./dlnkmgr offline -hba 0007.0000
KAPL01055-I All the paths which pass the specified HBA will be changed to the Offline(C) status. Is this OK? [y/n]:y
KAPL01056-I If you are sure that there would be no problem when all the paths which pass the specified HBA are placed in the Offline(C) status, enter y. Otherwise, enter n. [y/n]:y
KAPL01061-I 1 path(s) were successfully placed Offline(C); 0 path(s) were not. Operation name = offline
[16:33:10 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000001
PathStatus IO-Count IO-Errors
Reduced 252808586 0
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Offline(C) Own 81976455 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 170832131 0 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:33:24
终端3:
查看iostat情况,可以发现sdb流量为0,dsc Blk_wrtn 3421184增加了尽一倍
avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 23.19 2.30 0.00 73.70
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.20 0.00 2.40 0 24
sdb 0.00 0.00 0.00 0 0
sdc 334.00 0.00 342118.40 0 3421184
avg-cpu: %user %nice %system %iowait %steal %idle
0.84 0.00 23.48 2.26 0.00 73.42
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.20 0.00 2.40 0 24
sdb 0.00 0.00 0.00 0 0
sdc 335.60 0.00 343552.00 0 3435520
avg-cpu: %user %nice %system %iowait %steal %idle
0.83 0.00 23.18 2.39 0.00 73.60
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.60 0.00 4.80 0 48
sdb 0.00 0.00 0.00 0 0
sdc 335.70 0.00 343859.20 0 3438592
avg-cpu: %user %nice %system %iowait %steal %idle
0.84 0.00 23.33 2.47 0.00 73.36
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 334.60 0.00 342630.40 0 3426304
avg-cpu: %user %nice %system %iowait %steal %idle
0.86 0.00 23.03 2.32 0.00 73.80
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 334.80 0.00 342835.20 0 3428352
avg-cpu: %user %nice %system %iowait %steal %idle
0.80 0.00 22.68 3.51 0.00 73.01
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 335.10 0.00 343040.00 0 3430400
avg-cpu: %user %nice %system %iowait %steal %idle
0.87 0.00 22.52 3.20 0.00 73.41
终端2:
将关闭的光纤卡置为online
[16:33:24 root@localhost bin]# ./dlnkmgr online -hba 0007.0000
KAPL01057-I All the paths which pass the specified HBA will be changed to the Online status. Is this OK? [y/n]:y
KAPL01061-I 1 path(s) were successfully placed Online; 0 path(s) were not. Operation name = online
[16:34:20 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000002
PathStatus IO-Count IO-Errors
Online 272845735 0
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 82274955 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 190570780 0 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:34:22
终端3:
再看看io的情况,io负载分散到sdb和sdc上面
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.40 239.20 7.20 2392 72
sdb 801.90 0.00 118380.00 0 1183800
sdc 922.40 0.00 224718.50 0 2247185
avg-cpu: %user %nice %system %iowait %steal %idle
0.81 0.00 21.67 4.00 0.00 73.52
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 1127.90 0.00 170607.40 0 1706074
sdc 1105.10 0.00 147145.60 0 1471456
avg-cpu: %user %nice %system %iowait %steal %idle
0.86 0.00 22.77 2.70 0.00 73.68
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 1952.10 0.00 176125.40 0 1761254
sdc 1992.30 0.00 184086.40 0 1840864
avg-cpu: %user %nice %system %iowait %steal %idle
0.73 0.00 23.05 3.05 0.00 73.17
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.60 0.00 4.80 0 48
sdb 2100.40 0.00 174668.40 0 1746684
sdc 2152.80 0.00 176666.80 0 1766668
avg-cpu: %user %nice %system %iowait %steal %idle
0.88 0.00 22.60 3.07 0.00 73.45
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.40 48.00 11.20 480 112
sdb 1108.10 0.00 155167.60 0 1551676
sdc 1196.50 0.00 188496.00 0 1884960
avg-cpu: %user %nice %system %iowait %steal %idle
0.84 0.00 23.66 2.62 0.00 72.88
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 1174.20 0.00 185929.40 0 1859294
sdc 1074.30 0.00 155898.00 0 1558980
avg-cpu: %user %nice %system %iowait %steal %idle
0.88 0.00 23.17 2.48 0.00 73.47
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.20 0.00 2.40 0 24
sdb 1189.70 0.00 185251.80 0 1852518
sdc 1100.80 0.00 157490.00 0 1574900
avg-cpu: %user %nice %system %iowait %steal %idle
0.88 0.00 23.83 2.45 0.00 72.84
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 205.60 0 2056
sdb 1249.40 0.00 187183.10 0 1871831
sdc 1113.00 0.00 155370.00 0 1553700
avg-cpu: %user %nice %system %iowait %steal %idle
0.78 0.00 22.82 3.03 0.00 73.38
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.10 0.00 10.40 0 104
sdb 1541.30 0.00 176036.70 0 1760367
sdc 1441.80 0.00 151576.90 0 1515769
手动切换是不受影响的
但是如果拔掉光纤卡,读写在check完成之前,还是有影响的
终端1:
[16:39:45 root@localhost modprobe.d]# dd if=/dev/zero f=/dev/sddlmaa1
终端2:
[16:48:05 root@localhost ~]# iostat 10 50 >iostat.log
拔掉一个光纤
终端3:
[16:49:38 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000002
PathStatus IO-Count IO-Errors
Online 387300351 0
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 137667240 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 249633111 0 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:49:38
[16:49:38 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000001
PathStatus IO-Count IO-Errors
Reduced 387337873 22185
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 137704762 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Offline(E) Own 249633111 22185 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:49:39
[16:49:39 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000001
PathStatus IO-Count IO-Errors
Reduced 387450196 24029
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 137817085 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Offline(E) Own 249633111 24029 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:49:40
过一段时间,多路径软件会检测到一个链路变为Offline(E)
查看iostat情况,大概经过40-50s时间,io流量将为0了,之后检测到一个链路是正常的,io才正常
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.00 0.00 8.80 0 88
sdb 836.10 0.00 114120.00 0 1141200
sdc 850.80 0.00 160197.20 0 1601972
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.07 25.38 0.00 74.55
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.50 0.80 6.40 8 64
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.09 24.59 0.00 75.32
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.30 0.00 3.20 0 32
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.01 0.00 0.11 25.46 0.00 74.41
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 10.90 525.60 4.00 5256 40
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.16 0.00 0.25 24.48 0.00 75.11
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.30 51.20 11.20 512 112
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.93 0.00 16.15 11.85 0.00 71.07
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 5.90 163.20 210.40 1632 2104
sdb 19102.90 0.00 125551.50 0 1255515
插上光纤
[16:50:51 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000001
PathStatus IO-Count IO-Errors
Reduced 410617324 24029
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 160984213 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Offline(E) Own 249633111 24029 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:50:52
[16:50:52 root@localhost bin]# ./dlnkmgr view -path
Paths:000002 OnlinePaths:000002
PathStatus IO-Count IO-Errors
Online 415619590 24029
PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 0007.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 164203113 0 0 sddlmaa
000001 0008.0000.0000000000000000.0000 HITACHI .DF600F .85017915 0217 0A Online Own 251416477 24029 0 sddlmaa
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2013/08/05 16:51:07
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 1.10 0.00 56.80 0 568
sdb 3234.50 0.00 381491.50 0 3814915
avg-cpu: %user %nice %system %iowait %steal %idle
0.87 0.00 21.18 10.45 0.00 67.50
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 7.60 426.40 16.00 4264 160
sdb 335.70 0.00 343756.80 0 3437568
sdd 1.10 8.80 0.00 88 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.83 0.00 22.01 14.26 0.00 62.90
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 6.30 249.60 208.80 2496 2088
sdb 335.10 0.00 343142.40 0 3431424
sdd 11.70 95.70 0.00 957 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.80 0.00 22.89 7.45 0.00 68.87
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.60 10.40 5.60 104 56
sdb 336.00 0.00 344064.00 0 3440640
sdd 12.20 99.70 0.00 997 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.87 0.00 23.14 2.71 0.00 73.29
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.90 219.20 8.80 2192 88
sdb 335.20 0.00 343347.20 0 3433472
sdd 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.84 0.00 21.66 4.12 0.00 73.38
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.70 0.00 9.60 0 96
sdb 976.40 0.00 188648.40 0 1886484
sdd 993.40 0.00 153716.60 0 1537166
avg-cpu: %user %nice %system %iowait %steal %idle
0.78 0.00 22.12 3.73 0.00 73.37
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.30 0.00 4.00 0 40
sdb 1269.00 0.00 163058.60 0 1630586
sdd 1428.80 0.00 180634.50 0 1806345
由上面的内容看,io并没有收到影响,io又回复到负载均衡状态
从上面看,failover是需要时间的,对于一些要求比较高的应用,比如如果数据库负载比较高,这都是比较危险的,这与我们潜意思中双光纤卡冗余,如果其中一条坏掉,正常的那条链路是正常工作的。
hds专业解释:
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE
HDLM默认的负载均衡方式是RR轮询,例如主机IO,ABCDEFGH…..写下来,如果分在两条路径上,则路径一传ACEG……,路径二传BDFH…….,存储控制器在从两条路径收到数据后,再组合成ABCDEFGH,按顺序写到磁盘上。因为每个HBA卡的端口都有IO排队,即有队列深度可调。所以主机的IO会事先分配到两个HBA卡端口排队,如果路径一突然中断了,则主机会HOLD住所有的IO,将原先排队在路径一上等待传输的ACEG与路径二上的BDFH重新按序组合成ABCDEFGH,并重新排队到路径二上,再通过路径二发送到存储端。
所以中间无IO的时间,就是主机重新对HBA上的待发送IO的重新排序时间。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29033984/viewspace-767948/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29033984/viewspace-767948/