https://access.redhat.com/solutions/432153
SOLUTION UNVERIFIED - 已更新 2015年二月27日15:08 -
环境
- Redhat Enterprise Linux
问题
Errors seen in the system log:
Jul 2 09:56:06 host1 kernel: Buffer I/O error on device dm-9, logical block 668626947
Jul 2 09:56:06 host1 kernel: lost page write due to I/O error on dm-9
Jul 2 09:56:06 host1 kernel: Buffer I/O error on device dm-9, logical block 668626948
Jul 2 09:56:07 host1 kernel: lost page write due to I/O error on dm-9
Jul 2 09:56:07 host1 kernel: Buffer I/O error on device dm-9, logical block 668626947
Jul 2 09:56:07 host1 kernel: lost page write due to I/O error on dm-9
The multipath configuration shows the queue_if_no_path feature is enabled.
mpatho (3600508b400105c430002000000170000) dm-9 HP,HSV210
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:4:1 sdp 8:240 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 1:0:3:1 sdaj 66:48 active ready running
决议
Remove the 'no_path_retry' option or change it's value to 'queue'.
根源
The source of the filesystem failures is due to the 'no_path_retry' option having a value other than queue. The no_path_retry setting takes various arguments; fail (or 0) to fail I/Os immediately, queue to behave like the queue_if_no_path setting or a number N (in this case 5) to retry N times before failing. This setting is overriding the previously specified queue_if_no_path feature setting.
诊断步骤
Checking the multipath configuration file shows the no_path_retry option uses a numerical argument.
defaults {
user_friendly_names yes
polling_interval 30
}
devices {
device {
vendor "IBM"
product "2145"
path_grouping_policy group_by_prio
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
features "1 queue_if_no_path"
prio alua
path_checker tur
failback immediate
no_path_retry "5"
rr_min_io 1
dev_loss_tmo 120
}
}