Exadata 的诊断工具之 sundiag.sh

每个Exadata的数据库服务器和存储服务器节点都安装了sundiag.sh脚本(MOS:761868.1)

我们执行下:

[root@erpdb01 ~]# find /opt -name sundiag.sh
/opt/oracle.SupportTools/sundiag.sh
[root@erpdb01 ~]#

[root@erpdb01 oracle.SupportTools]# sh sundiag.sh
Oracle Exadata Database Machine - Diagnostics Collection Tool
Gathering Linux information

Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.

/tmp/sundiag_erpdb01_1338NML05G_2015_12_21_16_03
Generating diagnostics tarball and removing temp directory

==============================================================================
Done. The report files are bzip2 compressed in /tmp/sundiag_erpdb01_1338NML05G_2015_12_21_16_03.tar.bz2
==============================================================================
[root@erpdb01 oracle.SupportTools]# pwd
/opt/oracle.SupportTools
[root@erpdb01 oracle.SupportTools]#

文件 /tmp/sundiag_erpdb01_1338NML05G_2015_12_21_16_03.tar.bz2正常也就4M左右,解压后十多M不会很大

注:此BZ2包里包含很多的目录和文件,可以根据需要了解的信息直接搜索文件名

messages: 这个就时系统/var/log/messages文件的一个副本,该文件由syslog进程维护,

包括操作系统各类操作与健康情况的重要信息

Dec 19 19:25:06 erpdb01 last message repeated 4 times
Dec 19 19:26:10 erpdb01 last message repeated 4 times
Dec 19 19:27:14 erpdb01 last message repeated 4 times
Dec 19 19:28:18 erpdb01 last message repeated 4 times
Dec 19 19:29:22 erpdb01 last message repeated 4 times
Dec 19 19:30:26 erpdb01 last message repeated 4 times
Dec 19 19:31:30 erpdb01 last message repeated 4 times
Dec 19 19:32:34 erpdb01 last message repeated 4 times
Dec 19 19:33:38 erpdb01 last message repeated 4 times
Dec 19 19:34:08 erpdb01 last message repeated 2 times
Dec 19 19:53:36 erpdb01 auditd[9004]: Audit daemon rotating log files
Dec 20 00:24:56 erpdb01 ntpd[9339]: synchronized to LOCAL(0), stratum 10
Dec 20 00:50:44 erpdb01 ntpd[9339]: synchronized to 10.9.3.79, stratum 3
Dec 20 18:27:59 erpdb01 auditd[9004]: Audit daemon rotating log files
Dec 21 14:40:29 erpdb01 ntpd[9339]: synchronized to LOCAL(0), stratum 10
Dec 21 14:58:40 erpdb01 ntpd[9339]: synchronized to 10.9.3.79, stratum 3

dmesg:该文件由dmesg命令创建,包含来自缓冲区(kernel Ring Buffer)的内核诊断信息。

内核缓冲区包含了从系统外部设备(如磁盘驱动器,键盘,等)接受和发送的消息

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.39-400.128.21.el5uek (mockbuild@ca-build56.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-55)) #1 SMP Thu Apr 2 15:13:06 PDT 2015
Command line: root=LABEL=DBSYS bootarea=dbsys bootfrom=BOOT ro loglevel=7 panic=60 debug console=ttyS0,115200n8 console=tty1 pci=noaer log_buf_len=1m nmi_watchdog=0 nomce transparent_hugepage=never audit=1 crashkernel=380M@128M numa=off processor.max_cstate=1 intel_idle.max_cstate=0
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
 BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
 BIOS-e820: 0000000100000000 - 0000004080000000 (usable)
NX (Execute Disable) protection: active
SMBIOS 2.7 present.
DMI: Oracle Corporation SUN FIRE X4170 M3     /ASSY,MOTHERBOARD,1U   , BIOS 17100400 04/04/2014
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
No AGP bridge found

lspci:该文件包含了系统上所有的PCI总线列表

00:00.0 Host bridge: Intel Corporation Xeon E5/Core i7 DMI2 (rev 07)
00:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3a in PCI Express Mode (rev 07)
00:03.2 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3c (rev 07)
00:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 7 (rev 07)
00:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07)
00:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control Status and Global Errors (rev 07)
00:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07)
00:11.0 PCI bridge: Intel Corporation C600/X79 series chipset PCI Express Virtual Root Port (rev 06)
00:1a.0 USB controller: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #2 (rev 06)
40:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
50:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
61:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 02)
62:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 21)
7f:0a.0 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 0 (rev 07)
7f:0a.3 System peripheral: Intel Corporation Xeon E5/Core i7 Power Control Unit 3 (rev 07)
7f:0b.0 System peripheral: Intel Corporation Xeon E5/Core i7 Interrupt Control Registers (rev 07)

lsscsi:该文件包含了系统上所有SCSI驱动器列表

[0:2:0:0]    disk    LSI      MR9261-8i        2.13  /dev/sda 

fdisk-l.out :包含了系统上所有磁盘分区列表

Disk /dev/sda: 896.9 GB, 896998047744 bytes
255 heads, 63 sectors/track, 109053 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          65      522081   83  Linux
/dev/sda2              66      109053   875446110   8e  Linux LVM
Disk /dev/dm-0: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/dm-1: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/dm-2: 25.7 GB, 25769803776 bytes
255 heads, 63 sectors/track, 3133 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/dm-3: 544.3 GB, 544387104768 bytes
255 heads, 63 sectors/track, 66184 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

fdisk-l.err:磁盘报错信息

Disk /dev/dm-0 doesn't contain a valid partition table
Disk /dev/dm-1 doesn't contain a valid partition table
Disk /dev/dm-2 doesn't contain a valid partition table
Disk /dev/dm-3 doesn't contain a valid partition table

service_-status-all.out:这个是服务状态监测结果 

有意思的是本结果直接反映了某次集群无发启动的原因

只是巧合!

acpid (pid 9225) is running...
anacron is stopped
auditd (pid  9004) is running...
exadata_mon_hw_asr.pl (pid 9372) is running...
crond (pid  9508) is running...
2015-12-21 16:03:56 +0800  [INFO] New logging session is started
2015-12-21 16:03:56 +0800  [INFO] Command line: /etc/init.d/exachkcfg status
2015-12-21 16:03:56 +0800  The exachkcfg was already run
sudo exist
Usage: /opt/OracleHomes/agent_home/core/12.1.0.3.0/install/unix/scripts/agentstup {start|stop}
Mon Dec 21 16:03:57 CST 2015
 --- Please choice Database env ---
                                   
 1: EBS Database
 2: Peoplesoft Database
                                  
 ---------------------------------
Please Input 1 or 2:
!!! You do not set the ORACLE env. !!!
sudo exist
Usage: /u01/em12c/core/12.1.0.3.0/install/unix/scripts/agentstup {start|stop}
hald is stopped
Usage: /etc/init.d/init.tfa {stop|start|shutdown|restart}
Firewall is stopped.
ipmi_msghandler module loaded.
ipmi_si module loaded.
ipmi_devintf module loaded.
/dev/ipmi0 exists.
Firewall is stopped.
irqbalance (pid 9083) is running...
iscsid is stopped
iscsid is stopped
Kdump is operational
lsidiagd is stopped
lsi_mrdsnmpagent (pid 9284 9268) is running...
mcstransd is stopped
mdadm is stopped
mdmpd is stopped
dbus-daemon is stopped
ERROR: mlx4_vnic module is not loaded
multipathd is stopped
usage: /etc/init.d/netbackup { start | stop | start_msg | stop_msg }
netconsole module not loaded
netplugd is stopped
Configured devices:
lo bondeth0 bondib0 eth0 eth1 eth2 eth3 eth4 eth5 ib0 ib1
Currently active devices:
lo eth0 eth4 eth5 ib0 ib1 bondeth0 bondib0
rpc.mountd is stopped
nfsd is stopped
rpc.rquotad is stopped
rpc.statd is stopped
nscd is stopped
ntpd (pid  9339) is running...
Low level hardware support loaded:
mlx4_ib mlx4_core 


Upper layer protocol modules:
rds_rdma rds ib_ipoib 


User space access modules:
rdma_ucm ib_ucm ib_uverbs ib_umad 


Connection management modules:
rdma_cm ib_cm iw_cm 


Configured IPoIB interfaces: none
Currently active IPoIB interfaces: ib0 ib1 bondib0 
portmap is stopped
Process accounting is disabled.
rdisc is stopped
rngd is stopped
rpc.idmapd is stopped
rsyslogd is stopped
saslauthd is stopped
sendmail is stopped
smartd is stopped
snmpd (pid  9250) is running...
snmptrapd is stopped
openssh-daemon (pid  10147) is running...
syslogd (pid  9066) is running...
klogd (pid  9069) is running...
Xvnc is stopped
Symantec Backup Exec Remote Agent for Linux/Unix Servers
Usage: VRTSralus.init { start | stop | restart }
Symantec Private Branch Exchange is running
2015-12-21 16:11:23 +0800 [INFO] No eth interfaces need this work around.
xinetd is stopped

megacli64*:这个就有点多了,因为会以各种选项运行megacli64命令查询MegaRAID控制器收集

的各种控制器和磁盘的配置及状态信息

我们随便看点:

megacli64-AdpAllInfo.out

Adapter #0
==============================================================================
                    Versions
                ================
Product Name    : LSI MegaRAID SAS 9261-8i
Serial No       : SV31502020
FW Package Build: 12.12.0-0178
                    Mfg. Data
                ================
Mfg. Date       : 04/06/13
Rework Date     : 00/00/00
Revision No     : 27B
Battery FRU     : N/A


                Image Versions in Flash:
                ================
FW Version         : 2.130.373-2809
BIOS Version       : 3.29.00_4.14.05.00_0x05270000
Preboot CLI Version: 04.04-020:#%00009
WebBIOS Version    : 6.0-51-e_47-Rel
NVDATA Version     : 2.09.03-0047
Boot Block Version : 2.02.00.00-0000
BOOT Version       : 09.250.01.219

megacli64-BbuCmd.out

BBU status for Adapter: 0


BatteryType: iBBU08
Voltage: 3851 mV
Current: 0 mA
Temperature: 27 C
Battery State: Optimal
Design Mode  : 48+ Hrs retention with a non-transparent learn cycle and moderate service life.


BBU Firmware Status:


  Charging Status              : None
  Voltage                                 : OK

megacli64-CfgDsply.out:

==============================================================================
Adapter: 0
Product Name: LSI MegaRAID SAS 9261-8i
Memory: 512MB
BBU: Present
Serial No: SV31502020
==============================================================================
Number of DISK GROUPS: 1


DISK GROUP: 0
Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 4
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :DBSYS
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 835.394 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.464 GB
State               : Optimal
Strip Size          : 1.0 MB
Number Of Drives    : 4
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
Is VD Cached: No
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 252
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 11
WWN: 5000CCA022688FB3

其余包括:

megacli64-FwTermLog.out

megacli64-GetEvents-all.out

megacli64-LdInfo.out

megacli64-LdPdInfo.out

除了以上的诊断文件,sundiag还搜集存储节点的配置信心,告警和再数据库服务器上

不存在的其他的特殊的日志文件







  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值