1、确认故障主机报警和硬盘
ERPES01@ /> errpt -dH |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
80D3764C 0301180416 U H LVDD PV NO LONGER RELOCATING NEW BAD BLOCKS
E86653C3 0301180416 P H LVDD I/O ERROR DETECTED BY LVM
747725D9 0301180416 P H hdisk0 DISK OPERATION ERROR
查看LV在哪一个PV上
ERPES01@ /> lspv -l hdisk36
hdisk36:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
hd3 40 40 00..40..00..00..00 /tmp
hd9var 40 40 00..40..00..00..00 /var
hd10opt 40 40 40..00..00..00..00 /opt
hd1 40 40 40..00..00..00..00 /home
livedump 2 2 02..00..00..00..00 /var/adm/ras/livedump
hd11admin 40 40 15..25..00..00..00 /admin
hd5 1 1 01..00..00..00..00 N/A
hd8 1 1 00..00..01..00..00 N/A
hd6 4 4 00..04..00..00..00 N/A
hd2 144 144 00..00..00..57..87 /usr
hd4 160 160 00..00..108..52..00 /
ERPES01@ />
ERPES01@ /> lspv -l hdisk0
hdisk0:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
hd3 40 40 39..00..01..00..00 /tmp
hd9var 40 40 37..00..03..00..00 /var
hd10opt 40 40 00..00..03..00..37 /opt
hd1 40 40 24..00..01..00..15 /home
livedump 2 2 00..02..00..00..00 /var/adm/ras/livedump
hd11admin 40 40 00..00..01..00..39 /admin
lg_dumplv 16 16 00..16..00..00..00 N/A
hd5 1 1 01..00..00..00..00 N/A
hd8 1 1 00..00..01..00..00 N/A
hd6 4 4 00..04..00..00..00 N/A
hd2 144 144 09..10..16..109..00 /usr
hd4 160 160 00..77..83..00..00 /
ERPES01@ />
2、检查故障硬盘位hdisk0上LV都进行镜像除了lg_dumplv,必须迁移lg_dumplv到hdisk1上。
有没有镜像看PVS数量是否为2,如果PVS是2说明LV分布在两个pv上就是两块hdisk上
还有看LV的PPS数量应该是LPS的2倍
boot分区是启动引导分区状态为closed的 LV状态为syncd说明已经同步完成,stale需要同步。
ERPES01@ />migratepv -l lg_dumplv hdisk0 hdisk1
ERPES01@ />lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 100 200 2 open/syncd N/A
hd8 jfs2log 1 2 2 open/syncd N/A
hd4 jfs2 20 40 2 open/syncd /
hd2 jfs2 50 100 2 open/syncd /usr
hd9var jfs2 108 216 2 open/syncd /var
hd3 jfs2 50 100 2 open/syncd /tmp
hd1 jfs2 1 2 2 open/syncd /home
hd10opt jfs2 30 60 2 open/syncd /opt
hd11admin jfs2 1 2 2 open/syncd /admin
lg_dumplv sysdump 106 106 1 open/syncd N/A //没有镜像PV数量就是1
livedump jfs2 1 2 2 open/syncd /var/adm/ras/livedump
fslv00 jfs2 20 40 2 open/syncd /export/spot
es1oracle jfs2 20 40 2 open/syncd /oracle
es1usrsap jfs2 40 80 2 open/syncd /usr/sap
sapbackup jfs2 8 16 2 open/syncd /sapbackup
ERPES01@ />
3、在hdisk0上去掉引导信息
ERPES01@ />chpv -c hdisk0
4、删除rootvg的hdisk0镜像
ERPES01@ />unmirrorvg rootvg hdisk0
拆除镜像后所有LV的PV数量都只有1个
ERPES01@ />lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 1 1 closed/syncd N/A
hd6 paging 100 100 1 open/syncd N/A
hd8 jfs2log 1 1 1 open/syncd N/A
hd4 jfs2 20 20 1 open/syncd /
hd2 jfs2 50 50 1 open/syncd /usr
hd9var jfs2 108 108 1 open/syncd /var
hd3 jfs2 50 50 1 open/syncd /tmp
hd1 jfs2 1 1 1 open/syncd /home
hd10opt jfs2 30 30 1 open/syncd /opt
hd11admin jfs2 1 1 1 open/syncd /admin
lg_dumplv sysdump 106 106 1 open/syncd N/A
livedump jfs2 1 1 1 open/syncd /var/adm/ras/livedump
fslv00 jfs2 20 20 1 open/syncd /export/spot
es1oracle jfs2 20 20 1 open/syncd /oracle
es1usrsap jfs2 40 40 1 open/syncd /usr/sap
sapbackup jfs2 8 8 1 open/syncd /sapbackup
ERPES01@ />
5、从rootvg卷组上删除hdisk_0
ERPES01@ />reducevg rootvg hdisk0
ERPES01@ />lspv |more
hdisk0 none None
hdisk1 00f7c45cd89e7d3d rootvg active
hdisk2 none VeritasVolumes
6、该小型机上部署了Veritas Storage Foundation,在veritas上查看hdisk0对应的设备,查看本地磁盘设备。
ERPES01@ />vxdisk list|pg
DEVICE TYPE DISK GROUP STATUS
disk_0 auto:LVM - - LVM
disk_1 auto:LVM - - LVM
storwizev70000_000000 auto:cdsdisk san_vc0_0 ep01dg online shared
storwizev70000_00000a auto:cdsdisk san_vc0_10 ep01dg online shared
storwizev70000_00000b auto:cdsdisk san_vc0_11 ep01dg online shared
storwizev70000_00000c auto:cdsdisk san_vc0_12 ep01dg online shared
查看veritas下disk_1和操作系统的hdisk0对应关系
ERPES01@ />vxdisk list disk_1
Device: disk_1
devicetag: disk_1
type: auto
info: format=LVM
flags: LVM error private autoconfig
pubpaths: block=/dev/vx/dmp/disk_1 char=/dev/vx/rdmp/disk_1
guid: -
udid: IBM%5FHUC106030CSS600%5FDISKS%5F5000CCA025387860
site: -
errno: Disk is not useable, bad format
Multipathing information:
numpaths: 1
hdisk0 state=enabled //对应hdisk0
ERPES01@ />
7、删除hdisk0,在veritas层面删除disk_1和操作系统层面从ODM库删除hdisk0
ERPES01@ />vxdisk rm disk_1
如果不操作会报错如下
ERPES01@ />rmdev -dl hdisk
rmdev: 0514-519 The following device was not found in the customized
device configuration database:
name = 'hdisk'
ERPES01@ />
ERPES01@ />rmdev -Rdl hdisk0
8、通过diag进入判断硬盘的位置
#smitty diag>
#Task Selection>
#Hot Plug Task>
#SCSI and SCSI RAID Hot Plug Manager>
#Identify a Device Attached to a SCSI Hot Swap Enclosure Device>
#hdiskX
此时hdiskX的硬盘会亮红灯
9、通过diag进行更换前准备检查硬盘是否为更换状态
ERPES01@ />diag
->Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.)
->Hot Plug Task
->SCSI and SCSI RAID Hot Plug Manager
->Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Device
Make selection, use Enter to continue.
U78A0.001.DNWKP5C-
ses0P2-Y2
ses1 P2-Y1
slot 1 P2-D3 hdisk0
slot 2 P2-D4 hdisk1
U78A0.001.DNWKP5C-
ses2 P2-Y1
slot 2 P2-D2 cd0
U78A0.001.DNWKP5C-
ses0 P2-Y2
ses0 P2-Y2
ses1 P2-Y1
ses1 P2-Y1
slot 1 P2-D3 hdisk0 //是否显示为可替换状态
10、diag若显示更换状态,对硬盘进行更换操作
从新扫描cfgmgr和lspv显示
ERPES01@ />cfgmgr -l sas0 //扫描disk的父设备
ERPES01@ />lspv
hdisk0 00f7c45c94883387 None
hdisk1 00f7c45cd89e7d3d rootvg active
hdisk2 none VeritasVolumes
hdisk3 none VeritasVolumes
11、把hdisk0加入到rootvg磁盘组
ERPES01@ />extendvg rootvg hdisk0
12、进行LV同步使用大S后台进行镜像同步,iostat观察hdisk0与hdisk1的IO情况。
ERPES01@ />mirrorvg -S rootvg hdisk0
0516-1804 chvg: The quorum change takes effect immediately.
0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
bosboot of system to initialize boot records. Then, user must modify
bootlist to include: hdisk0 hdisk1.
ERPES01@ />lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 100 200 2 open/stale N/A
hd8 jfs2log 1 2 2 open/stale N/A
hd4 jfs2 20 40 2 open/stale /
hd2 jfs2 50 100 2 open/stale /usr
hd9var jfs2 108 216 2 open/stale /var
hd3 jfs2 50 100 2 open/stale /tmp
hd1 jfs2 1 2 2 open/stale /home
hd10opt jfs2 30 60 2 open/stale /opt
hd11admin jfs2 1 2 2 open/stale /admin
lg_dumplv sysdump 106 106 1 open/syncd N/A
livedump jfs2 1 2 2 open/stale /var/adm/ras/livedump
fslv00 jfs2 20 40 2 open/stale /export/spot
es1oracle jfs2 20 40 2 open/stale /oracle
es1usrsap jfs2 40 80 2 open/stale /usr/sap
sapbackup jfs2 8 16 2 open/stale /sapbackup
ERPES01@ />iostat 2 3 //观察IO情况
System configuration: lcpu=20 drives=159 paths=2 vdisks=0
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.0 30.2 3.1 2.7 89.7 4.5
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk1 27.0 73523.1 71.8 148480 0
hdisk0 69.5 73523.1 71.8 0 148480
并检查rootvg是否把quorum进行disable
ERPES01@ />lsvg rootvg
VOLUME GROUP: rootvg VG IDENTIFIER: 00f7c45c00004c0000000139336e70e3
VG STATE: active PP SIZE: 512 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1116 (571392 megabytes) //PP大小数量核对
MAX LVs: 256 FREE PPs: 108 (55296 megabytes)
LVs: 16 USED PPs: 1008 (516096 megabytes)
OPEN LVs: 15 QUORUM: 1 (Disabled) //quorum进行disable
TOTAL PVs: 2 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none
ERPES01@ />
最后LVstate从stale都变为syncd表示同步完成
ERPES01@ />lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 1 2 2 closed/syncd N/A
hd6 paging 100 200 2 open/syncd N/A
hd8 jfs2log 1 2 2 open/syncd N/A
hd4 jfs2 20 40 2 open/syncd /
hd2 jfs2 50 100 2 open/syncd /usr
hd9var jfs2 108 216 2 open/syncd /var
hd3 jfs2 50 100 2 open/syncd /tmp
hd1 jfs2 1 2 2 open/syncd /home
hd10opt jfs2 30 60 2 open/syncd /opt
hd11admin jfs2 1 2 2 open/syncd /admin
lg_dumplv sysdump 106 106 1 open/syncd N/A
livedump jfs2 1 2 2 open/syncd /var/adm/ras/livedump
fslv00 jfs2 20 40 2 open/syncd /export/spot
es1oracle jfs2 20 40 2 open/syncd /oracle
es1usrsap jfs2 40 80 2 open/syncd /usr/sap
sapbackup jfs2 8 16 2 open/syncd /sapbackup
ERPES01@ />
13、对引导记录进行相关操作
在hdisk0上创建引导记录信息
ERPES01@ />bosboot -ad hdisk0
ERPES01@ />bosboot -ad hdisk1
增加引导顺序,如何有cd0 就放在最后面
ERPES01@ />lsdev -C |grep cd
cd0 Available 01-08-00 SATA DVD-RAM Drive
ERPES01@ />bootlist -m normal hdisk0 hdisk1 cd0
查看引导信息
ERPES01@ />bootlist -m normal -o
hdisk0 blv=hd5 pathid=0
hdisk1 blv=hd5 pathid=0
cd0
ERPES01@ />
15、其他相关注意点
查看LV STATE,stale全部变为syncd,表示同步完成。PPs=2倍的LPs,表示LV有镜像。
lg_dumplv为系统的dumplv,一般不会被镜像,只有当页面空间的lv和dupmlv为同一个lv时mirrorvg才会对其镜像
#清除PVID,不是必须的,执行了更安全
chdev -l hdisk0 -a pv=clear
chdev -l hdisk0 -a pv=yes
#vg做完mirror之后,quorum会自动disable,以保证两块盘中坏了一块之后VG依然可以自动varyon
lsvg rootvg
如果无法删除设备可以使用 fuser /dev/hdisk0检查 ps -ef |grep hdisk0
如果刚好是lg_dump的那个盘坏了,没有做镜像的话换盘的步骤多3步。
sysdump要先指向另外一个位置,删除sysdump,更换盘后重建sysdump并指回来。
阅读(1787) | 评论(0) | 转发(0) |