This document in intended to assist with identifying and resolving system panics related to Solaris

最新推荐文章于 2022-08-29 11:31:15 发布

张小秋博客

最新推荐文章于 2022-08-29 11:31:15 发布

阅读量792

点赞数

分类专栏： solaris-10 文章标签： solaris

solaris-10 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

This document in intended to assist with identifying and resolving system panics related to Solaris Volume Manager (SVM).

Example panics:

panic[cpu0]/thread=180e000: vfs_mountroot: cannot mount root
panic[cpu3]/thread=2a100627ca0: md: Panic due to lack of DiskSuite state
panic[cpu0]/thread=fec1be20: mod_hold_stub: Couldn't load stub module misc/strplumb
panic[cpu12]/thread=30088990060: BAD TRAP: type=31 rp=2a100b11360 addr=30875d6eb78 mmu_fsr=0
panic[cpu1]/thread=2a10010bd40: kstat_q_exit: qlen == 0

TROUBLESHOOTING STEPS

1) System panics at boot and SVM is suspected to cause or contribute to the issue.

2) Panics after issuing a command or during day to day operation.

3) How to determine if SVM is involved in the panic.

4) How to get further assistance.

1) System panics at boot and SVM is suspected to cause or contribute to the issue.


The most common SVM related boot panic is from vfs_mountroot.


panic[cpu0]/thread=180e000: vfs_mountroot: cannot mount root

This could be due to a large number of reasons at the SVM, driver or file system level.  It is often advantageous to narrow down were the problem lies by removing SVM from the picture.

The following documents describe how to boot off of the disk slices instead of SVM mirrors.  This procedure will either allow the OS to boot by removing SVM or eliminate SVM as a potential cause in the troubleshooting process.

For SPARC:
Solaris Volume Manager (SVM): How to Recover from Boot Problems (Doc ID 1005712.1)

For X86:
Solaris 10: x86 (not SPARC) How to unencapsulate SVM root disk under grub (Doc ID 1019804.1)



If removing SVM has already been tried and the system panics again with vfs_mountroot after putting root back under SVM control one of the following documents may apply.


Solaris Volume Manger, SVM (root) mirrored system can not boot Resolution Path (Doc ID 1489871.1)

Applying 127127-11 to primary domain on ldoms system causes guest domains with SVM mirrors to panic (Doc ID 1019595.1)

Solaris Volume Manager (SVM) server panics continuously with 'vfs_mountroot: cannot remount root' due to missing replica/metadb information (Doc ID 1500501.1)

Solaris Volume Manager: 'boot -a' with Default Root Device Will Not Boot Properly if Root Disk is Mirrored With SVM / SDS (Doc ID 1001569.1)

Solaris Volume Manager (SVM) State Databases May Disappear After Reboot on Systems Equipped With MAT3073N, MAT3147N, MAT3300N, ST373207LC or ST314670LC Drives (Doc ID 1000157.1)



If the system is panicking on boot without the vfs_mountroot panic string check the following documents for SVM boot panics that do not have the vfs_mountroot signature.


Solaris Volume Manager, SVM, System Hangs When Root Is Encapsulated With SVM And Logging Enabled And >50% Of State Database Replicas Metadb Are Lost (Doc ID 1472328.1)

panic[cpu3]/thread=2a100627ca0: md: Panic due to lack of DiskSuite state

Booting Off of a Single Disk From a Mirrored Root Pair May Fail With a Panic (Doc ID 1000168.1)

panic[cpu0]/thread=fec1be20: mod_hold_stub: Couldn't load stub module misc/strplumb

Solaris Volume Manager (SVM) can panic system when attempting to access a damaged one-sided mirror (Doc ID 1490045.1)

panic[cpu12]/thread=30088990060: BAD TRAP: type=31 rp=2a100b11360 addr=30875d6eb78 mmu_fsr=0





2) Panics after issuing a command or during day to day operation.


If the system panicked after removing a meta state database with 'metadb -d'

Solaris Volume Manager (SVM) server panics continuously with 'vfs_mountroot: cannot remount root' due to missing replica/metadb information (Doc ID 1500501.1)


If the system panicked after deleting a soft partition.

Solaris Volume Manager (SVM): Host may panic with BAD TRAP type 31 when deleting a soft partition with metaclear (Doc ID 1377086.1)


If the system panics during operation, after removing a disk or after issuing a 'metadb -d' command with the panic string "md: Panic due to lack of 
DiskSuite state database replicas. Fewer than 50% of the total were available, so panic to ensure data integrity."

Panic due to lack of Solstice DiskSuite state database replicas (Doc ID 1005440.1)



If there is a BAD TRAP panic with vfs_unmountall in the panic stack.

System panic in vfs_unmountall( ) due to corruption with Solaris Volume Manager (Doc ID 1470043.1)


Outside of Solaris Cluster the libmeta patch may cause errors but with Solaris Cluster it can trigger a panic.

Solaris 10 libmeta Patches may Cause Solaris Volume Manager Failures or Failfast Panics (Doc ID 1346818.1)


There are a couple of instances where SVM mirroring can contribute to freeing free inode/frag/block panics.


1) Running fsck against a mirror device without running metasync against the mirror first.  This is mainly seen when booting to single user to try and recover a root file system.

2) Altering a sub-mirror in any way while more than one sub-mirror is attached to a mirror.

For example if the mirror d10 has a sub-mirror on c0t0d0s0 and one on sub-mirror c1t0d0s0.  If an administrator was to mount c1t0d0s0 and make changes 
or run fsck against c1t0d0s0 and then mount d10 the file system will likely give freeing free errors and possibly cause a panic.


In general freeing free inode/frag/block panics are not a SVM issue. Further information on freeing free inode/frag/block panics can be found in the following documents.

Troubleshooting the Cause of Solaris File System Corruption and Preventing Future Corruption (Doc ID 1009218.1)

How to Fix Solaris Panics Caused by "freeing free" or "ufs_putapage: bn == UFS_HOLE" (Doc ID 1017680.1)


Panic with kstat_q_exit: qlen == 0 can be SVM related but there are other issues that have this panic signature.

To determine if this panic is BugID 15780923 look at the stack from /var/adm/messages.  The function md:md_kstat_waitq_to_runq will be in the stack.

May 30 04:45:01 sun01 ^Mpanic[cpu1]/thread=2a10010bd40:
May 30 04:45:01 sun01 unix: [ID 213328 kern.notice] kstat_q_exit: qlen == 0
May 30 04:45:01 sun01 unix: [ID 100000 kern.notice]
May 30 04:45:01 sun01 genunix: [ID 723222 kern.notice] 000002a10010b770 SUNW,UltraSPARC-IIIi:kstat_q_panic+8 (30005431f00, 0, ffffffffffffffff, 550000000c, 0, 13913b8)
May 30 04:45:01 sun01 genunix: [ID 179002 kern.notice] %l0-3: 000003000008ab28 000003000008ab20 00000300054133ea 000000000142e968
May 30 04:45:01 sun01 %l4-7: 0000000000000006 00000000fe945fd0 0000000000000000 0000000001438800
May 30 04:45:02 sun01 genunix: [ID 723222 kern.notice] 000002a10010b820 md:md_kstat_waitq_to_runq+20 (3000008aaf8, ffffffffffffffff, 2c, 0, 100c6a4, 0)
May 30 04:45:02 sun01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000004 000003000008ab20 00000300004c7b08 000003000008ab28
May 30 04:45:02 sun01 %l4-7: 00000300004c7cb0 00000300004c7cd8 00000300004c7d80 000003000545f268
May 30 04:45:02 sun01 genunix: [ID 723222 kern.notice] 000002a10010b8d0 md_stripe:md_stripe_strategy+24c (3, 3000035be18, 0, 0, 1, 3000545f2c0)
This issue is fixed in:
Solaris 11: 11.1 SRU 2.5.
Solaris 10 SPARC: 150311-02 SunOS 5.10: md patch
Solaris 10 x86: 148076-11 SunOS 5.10_x86: md patch




3) How to determine if SVM is involved in the panic.


Some panics are easy to tie to SVM.  For example, this panic string starts with the SVM module 'md' and then mentions state database replicas.

panic[cpu3]/thread=2a100627ca0: md: Panic due to lack of DiskSuite state
database replicas. Fewer than 50% of the total were available,
so panic to ensure data integrity.


Other panics are harder to connect with SVM.
In this example the panic string is a bad trap and it refers to the 'did' module from Solaris Cluster.  At first glance this would not appear to 
involve SVM but the next line after the panic string shows that the metaset command was involved in the panic.  When looking a the stack trace there 
are several mentions of the md_mirror module.

panic[cpu2]/thread=3000ef04d60:
BAD TRAP: type=31 rp=2a102444b30 addr=4f0 mmu_fsr=0 occurred in module "did" due to a NULL pointer dereference

metaset:
trap type = 0x31
addr=0x4f0
pid=2510, pc=0x7845272c, sp=0x2a1024443d1, tstate=0x80001600,context=0x14c4
g1-g7: 1443800, 14d8800, 78001db0, 30000224eb8, 3000ef0f838, 0,3000ef04d60

000002a102444860 unix:die+80 (31, 2a102444b30, 4f0, 0, 2a1024458bc,14d8932)
%l0-3: 0000000000000000 0000000001413798 000002a102444b30 000002a102444a28 %l4-7: 0000000000000031 00000000ff210000 0000000000000000 00000000ffbff7e0
000002a102444940 unix:trap+874 (2a102444b30, 0, 10000, 10200, 0, 2a102445aec)
%l0-3: 0000000000000001 0000000000000000 00000300050154f0 0000000000000031 %l4-7: 0000000000000005 0000000000000001 0000000000000000 0000000000000000
000002a102444a80 unix:ktl0+48 (30004a6db90, 8, 3000410cdf0, 1,14d89ab, 0)
%l0-3: 0000000000000001 0000000000001400 0000000080001600 000000000102cb30 %l4-7: 00000300049c6ca0 00000300049c70a0 0000000000000000
000002a102444bd0 did:didprop_op+64 (1e, 0, 1, 9, 14d89b0,2a102444d6c)
%l0-3: 000000007845220c 0000000078001e38 000002a102444d68 0000030004a6db90 %l4-7: 00000000012fe5dc 0000000000000400 00000000000000f0 ffffffffffffffff
000002a102444c90 md_mirror:mirror_check_failfast+254 (2,ffffffff,1fff, 14bb390, 0, 0)
%l0-3: 0000000078452694 0000000000000001 000000000000012c 000000000000012c %l4-7: 00000300203a60a8 0000000000000000 0000030004fd78a8 00000000012fe5dc
...




In this example the panic string does not mention SVM, DiskSuite, md, meta or any other SVM term.  However, the metaclear command was involved in the 
panic and several SVM modules are mentioned in the stack trace, md, md_mirror, md_stripe and md_sp.


panic[cpu1]/thread=30001f735c0: BAD TRAP: type=31 rp=2a100bc92e0 addr=308000c8ff8 mmu_fsr=0

metaclear: trap type = 0x31
addr=0x308000c8ff8
pid=207, pc=0x10e32b4, sp=0x2a100bc8b81, tstate=0x880001602, context=0x1
g1-g7: 13983bc, 0, 20000, 5c8ff80, 88, 1, 30001f735c0

000002a100bc9000 unix:die+9c (31, 2a100bc92e0, 308000c8ff8, 0, 2a100bc90c0, e25d8017)
%l0-3: 00000000c0800000 0000000000000031 0000000001000000 0000000000002000
%l4-7: 0000000000100000 0000060019fd9860 0000000000000000 000000000109d000
000002a100bc90e0 unix:trap+9e0 (2a100bc92e0, 0, 1fff, 6, 308000c8000, 1)
%l0-3: 0000000000000000 0000060019fd9860 0000000000000031 0000000000001c00
%l4-7: 0000000000000000 0000000000000001 ffffffffffffe000 0000000000000006
000002a100bc9230 unix:ktl0+48 (0, 101, ffffffffffffffff, 60011e68e98, ffffffff, ffffffffffffffff)
%l0-3: 0000000000000003 0000000000001400 0000000880001602 000000000101bd10
%l4-7: 0000000005200170 0000000005200000 0000000000000000 000002a100bc92e0
000002a100bc9380 md:md_call_strategy+5c (60011e68db8, 18cb800, ffffffff, ffffffff, ffffffff, ffffffffffffffff)
%l0-3: 0000000000000000 000006001063cbf8 0000000001907400 0000000000000000
%l4-7: 0000000002000000 0000000000000000 00000300000c9000 00000007fffffff8
000002a100bc9430 md_stripe:md_stripe_strategy+2fc (6001200d0f0, 600107a6548, 60011e68d78, 6001063cb70, 809, 1)
%l0-3: 000000000190d890 0000000000000000 0000000000000200 0000000005ac01e0
%l4-7: 00000600107a65a0 0000000000000000 0000000000000001 0000060011e68db8
000002a100bc9510 md_mirror:mirror_write_strategy+83c (6001de38f40, 1, 0, 600107a3630, 6001200d0b8, 809)
%l0-3: 000006001200d0f0 000006001070b740 0000000000000000 0000000000000004
%l4-7: 00000600107a3598 000006001058e000 0000000000000000 000000000190dd78
000002a100bc95c0 md_sp:sp_update_watermarks+270 (6001281a190, 808, 6001de38f78, 6001de38f80, 6001de38f40, 96)
%l0-3: 0000000000000100 0000000000000200 000006001281a1a8 00000600181203e8
%l4-7: 0000000000000000 0000000000000008 000006001cb00340 0000000000000000



4) How to get further assistance.

At this point if you can not identify the issue or need further support with a SVM panic you can ask for help in the Oracle Solaris Volume Manager Community or contact Oracle Support for further assistance.

To contact Solaris Volume Manager Community where you can ask questions go to: 

https://community.oracle.com/community/support/oracle_sun_technologies/oracle_solaris_volume_manager



If you have reviewed this document and wish to engage Oracle Support for assistance please refer to the following document for directions on how to collect a system crash file for analysis.


How to Collect System Crash Dump Images on Solaris 8 and Newer (Doc ID 1004803.1)





To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the

My Oracle Support Community - Oracle Solaris Volume Manager