Exadata使用EXAchk进行健康检查


前言

Exachk 是Exadata上的健康检查工具,Oracle在Exadata机器上面有它的最佳实践和一些配置的建议值,我们定期使用exachk这个工具收集机器上的系统信息,并结合这些最佳实践和建议值,可以及时发现有哪些潜在的问题。然后把这些隐患消除,最终保障Exadata系统的稳定运行。并且生成报告中会为机器做出一个综合打分,对决策者有一个直观的判断上的帮助!


提示:以下是本篇文章正文内容,下面案例可供参考

一、操作示例过程:

在这里插入图片描述

[root@dm01dbadm01 ~]# exachk -v
EXACHK  VERSION: 20.2.3_20201012

[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# tfactl version -all
TFA Version : 202300
TFA Build ID : 20201012184854
TFA Build Label : TFA_AHF202_GENERIC_201012.1654

EXACHK  VERSION: 20.2.3_20201012


AHF VERSION: 20.2.3

[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# which exachk
/usr/bin/exachk
[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# 
[root@dm01dbadm01 ~]# exachk
This version of exachk was released on 12-Oct-2020 and it is older than 180 days. No new version of exachk is available in RAT_UPGRADE_LOC. It is highly recommended that you download the latest version of exachk from my oracle support to ensure the highest level of accuracy of the data contained within the report.



updates.oracle.com is not reachable. Please establish connectivity to updates.oracle.com to download latest version of AHF from MOS patch 30166242 and try again
Running older version...

 
root@dm01dbadm02's password: 
root@dm01dbadm02's password: 

Searching for running databases . . . . .

.  .  .  .  .  .  .  .  
List of running databases registered in OCR

1. ***
2. ****
3. *****
4. ******
5. All of above
6. None of above

Select databases from list for checking best practices. For multiple databases, select 5 for All or comma separated number like 1,2 etc [1-6][5]. 1

Reading storage servers password from wallet.

Wallet does not have password for all cells




root user equivalence is not setup between dm01dbadm01 and storage server dm01cel01-priv2 (192.168.***.***).

1. Enter 1 if you will enter root password for each storage server when prompted.

2. Enter 2 to exit and configure root user equivalence manually and re-run exachk.

3. Enter 3 to skip checking best practices on storage server.
                            
Indicate your selection from one of the above options for storage server[1-3][1]:-  1


Is root password same on all storage server?[y/n][y] y


Enter root password for storage server :-  
Verifying root password ...
.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  
Reading Infiniband switches password from wallet.
.  .  
Wallet does not have password for all switches




9 of the included audit checks require root privileged data collection on infiniband switch

1. Enter 1 if you will enter root password for each infiniband switch when prompted

2. Enter 2 to exit and to arrange for root access and run the exachk later.

3. Enter 3 to skip checking best practices on infiniband switch
                
Indicate your selection from one of the above options for infiniband switch[1-3][1]:-  3



Either Cluster Verification Utility pack (cvupack) does not exist at /opt/oracle.ahf/common/cvu or it is an old or invalid cvupack

Checking Cluster Verification Utility (CVU) version at CRS Home - /u01/app/19.0.0.0/grid

This version of Cluster Verification Utility (CVU) was released on 01-Oct-2020 and it is older than 180 days. It is highly recommended that you download the latest version of CVU from MOS patch 30839369 to ensure the highest level of accuracy of the data contained within the report

updates.oracle.com is not reachable. Please establish connectivity to updates.oracle.com to download Cluster Verification Utility (CVU) pack or download the latest version of CVU from MOS patch 30839369 and copy in /opt/oracle.ahf/common/cvu directory.
Running older version of Cluster Verification Utility (CVU) from CRS Home - /u01/app/19.0.0.0/grid


Do you want to store storage servers password in wallet permanently?[y/n][n] y
Storing storage servers password in wallet...


Starting to run exachk in background on dm01dbadm02 using socket
root@dm01dbadm02's password: 
root@dm01dbadm02's password: 
This version of exachk was released on 12-Oct-2020 and it is older than 180 days. No new version of exachk is available in RAT_UPGRADE_LOC. It is highly recommended that you download the latest version of exachk from my oracle support to ensure the highest level of accuracy of the data contained within the report.



updates.oracle.com is not reachable. Please establish connectivity to updates.oracle.com to download latest version of AHF from MOS patch 30166242 and try again
Running older version...

 
.  .  .  .
.  .  

Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS on dm01dbadm01

.  .  . . . .  
.  .  . . . .  .  .  .  .  .  .  .  
-------------------------------------------------------------------------------------------------------
                                                 Oracle Stack Status                          
-------------------------------------------------------------------------------------------------------
  Host Name       CRS Installed  RDBMS Installed    CRS UP    ASM UP  RDBMS UP    DB Instance Name
-------------------------------------------------------------------------------------------------------
dm01dbadm01                Yes          Yes          Yes      Yes      Yes                 dw1
-------------------------------------------------------------------------------------------------------
. 

. 

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  

. 
. 
. 
. 
. 



*** Checking Best Practice Recommendations ( Pass / Warning / Fail ) ***


.  

Collections and audit checks log file is 
/u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output/exachk_dm01dbadm01_dw_082721_115456/log/exachk.log

============================================================
            Node name - dm01dbadm01
============================================================



 Collecting - ASM Disk Group for Infrastructure Software and Configuration
 Collecting - ASM Diskgroup Attributes
 Collecting - ASM diskgroup usable free space
 Collecting - ASM initialization parameters
 Collecting - Database Parameters for dw database
 Collecting - Database Undocumented Parameters for dw database
 Collecting - RDBMS Feature Usage for dw database
 Collecting - CPU Information
 Collecting - Clusterware and RDBMS software version
 Collecting - Compute node PCI bus slot speed for infiniband HCAs
 Collecting - Kernel parameters
 Collecting - Maximum number of semaphore sets on system
 Collecting - Maximum number of semaphores on system
 Collecting - OS Packages
 Collecting - Patches for Grid Infrastructure
 Collecting - Patches for RDBMS Home
 Collecting - RDBMS patch inventory
 Collecting - number of semaphore operations per semop system call
 Collecting - CRS user limits configuration
 Collecting - CRS user time zone check
 Collecting - Check alerthistory for non-test open stateless alerts [Database Server]
 Collecting - Check alerthistory for stateful alerts not cleared [Database Server]
 Collecting - Clusterware patch inventory
 Collecting - Discover switch type(spine or leaf)
 Collecting - Exadata Critical Issue DB09
 Collecting - Exadata Critical Issue EX30
 Collecting - Exadata Critical Issue EX56
 Collecting - Exadata Critical Issue EX57
 Collecting - Exadata Critical Issue EX58
 Collecting - Exadata critical issue EX48
 Collecting - Exadata critical issue EX50
 Collecting - Exadata critical issue EX55
 Collecting - Exadata software version on database server
 Collecting - Exadata system model number
 Collecting - Exadata version on database server
 Collecting - HCA firmware version on database server
 Collecting - HCA transfer rate on database server
 Collecting - Infrastructure Software and Configuration for compute
 Collecting - MaxStartups setting in sshd_config
 Collecting - OFED Software version on database server
 Collecting - Obtain hardware information
 Collecting - Operating system and Kernel version on database server
 Collecting - Oracle monitoring agent and/or OS settings on ADR diagnostic directories
 Collecting - Raid controller bus link speed
 Collecting - Review Non-Exadata components in use on the InfiniBand fabric
 Collecting - System Event Log
 Collecting - Validate key sysctl.conf parameters on database servers
 Collecting - Verify Data Network is Separate from Management Network
 Collecting - Verify Database Server Disk Controller Configuration
 Collecting - Verify Database Server Physical Drive Configuration
 Collecting - Verify Database Server Virtual Drive Configuration
 Collecting - Verify Disk Cache Policy on database server
 Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Database Server]
 Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON
 Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE
 Collecting - Verify IP routing configuration on database servers
 Collecting - Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers
 Collecting - Verify Master (Rack) Serial Number is Set [Database Server]
 Collecting - Verify Quorum disks configuration
 Collecting - Verify RAID Controller Battery Temperature [Database Server]
 Collecting - Verify RAID disk controller CacheVault capacitor condition [Database Server]
 Collecting - Verify active kernel version matches expected version for installed Exadata Image
 Collecting - Verify available ksplice fixes are installed [Database Server]
 Collecting - Verify basic Logical Volume(LVM) system devices configuration
 Collecting - Verify database server InfiniBand network MTU size
 Collecting - Verify database server disk controllers use writeback cache
 Collecting - Verify database server file systems have Maximum mount count = -1
 Collecting - Verify imageinfo on database server
 Collecting - Verify imageinfo on database server to compare systemwide
 Collecting - Verify installed rpm(s) kernel type match the active kernel version
 Collecting - Verify key InfiniBand fabric error counters are not present
 Collecting - Verify no database server kernel out of memory errors
 Collecting - Verify proper ACFS drivers are installed for Spectre v2 mitigation
 Collecting - Verify service exachkcfg autostart status on database server
 Collecting - Verify the InfiniBand Fabric Topology (verify-topology)
 Collecting - Verify the Master Subnet Manager is running on an InfiniBand switch
 Collecting - Verify the Name Service Cache Daemon (NSCD) configuration
 Collecting - Verify the Subnet Manager is properly disabled [Database Server]
 Collecting - Verify the currently active image status [Database Server]
 Collecting - Verify the ib_sdp module is not loaded into the kernel
 Collecting - Verify the storage servers in use configuration matches across the cluster
 Collecting - collect time server data [Database Server]
 Collecting - root time zone check
 Collecting - verify asr exadata configuration check via ASREXACHECK on database server

Starting to run root privileged commands in background on storage server dm01cel01 (192.168.***.***)






Starting to run root privileged commands in background on storage server dm01cel02 (192.168.***.***)






Starting to run root privileged commands in background on storage server dm01cel03 (192.168.***.***)






Collections from storage server:
------------------------------------------------------------
 Collecting - Exadata Critical Issue EX22
 Collecting - Exadata Critical Issue EX10
 Collecting - Exadata Critical Issue EX11
 Collecting - Exadata Critical Issue EX28
 Collecting - Exadata Critical Issue EX45
 Collecting - Exadata Critical Issue EX31
 Collecting - Exadata Critical Issue EX54
 Collecting - Exadata Critical Issue EX57
 Collecting - Exadata Critical Issue EX58
 Collecting - Exadata critical issue EX16
 Collecting - Exadata critical issue EX37
 Collecting - Exadata critical issue EX14
 Collecting - Exadata critical issue EX48
 Collecting - Exadata critical issue EX47
 Collecting - Exadata software version on storage server
 Collecting - Exadata software version on storage servers
 Collecting - Exadata critical issue EX51
 Collecting - Infrastructure Software and Configuration for storage
 Collecting - Exadata storage server system model number
 Collecting - Verify Exadata Smart Flash Cache is created
 Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Storage Server]
 Collecting - RAID controller version on storage servers
 Collecting - Verify Disk Cache Policy on storage servers
 Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE on storage servers
 Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON on storage servers
 Collecting - Verify OSSCONF/cellinit.ora consistency across storage servers
 Collecting - Verify Master (Rack) Serial Number is Set [Storage Server]
 Collecting - Verify RAID Controller Battery Temperature [Storage Server]
 Collecting - Verify RAID disk controller CacheVault capacitor condition [Storage Server]
 Collecting - Verify active system values match those defined in configuration file cell.conf  [Storage Server]
 Collecting - Verify Storage Server user CELLDIAG exists
 Collecting - Verify available ksplice fixes are installed [Storage Server]
 Collecting - Verify data (non-system) disks on Exadata Storage Servers have no partitions
 Collecting - Verify imageinfo on storage server to compare systemwide
 Collecting - Verify imageinfo on storage server
 Collecting - Verify service exachkcfg autostart status on storage server
 Collecting - Verify release tracking bug on storage servers
 Collecting - Verify the Subnet Manager is properly disabled [Storage Server]
 Collecting - Verify there are no files present that impact normal firmware update procedures [Storage Server]
 Collecting - collect time server data [Storage Server]
 Collecting - Check alerthistory for non-test open stateless alerts [Storage Server]
 Collecting - collect storage server flashcachemode data
 Collecting - verify asr exadata configuration check via ASREXACHECK on storage servers
 Collecting - Check alerthistory for stateful alerts not cleared [Storage Server]
 Collecting - OFED Software version on storage server
 Collecting - Determine storage server type(All Flash/High Capacity)
 Collecting - Exadata Celldisk predictive failures
 Collecting - Exadata storage server root filesystem free space
 Collecting - HCA firmware version on storage server
 Collecting - Verify Datafiles are Placed on Diskgroups consisting of griddisks with correct attributes
 Collecting - Verify Exadata Smart Flash Cache is actually in use
 Collecting - Verify Ethernet Cable Connection Quality on storage servers
 Collecting - Verify ExaWatcher is executing [Storage Server]
 Collecting - Operating system and Kernel version on storage server
 Collecting - Storage server make and model
 Collecting - Verify Data Network is Separate from Management Network on storage server
 Collecting - Storage server flash cache mode
 Collecting - Verify Exadata Smart Flash Cache status is normal
 Collecting - Verify Exadata Smart Flash Log is Created
 Collecting - Verify InfiniBand Cable Connection Quality on storage servers
 Collecting - Verify average ping times to DNS nameserver [Storage Server]
 Collecting - Verify griddisk count matches across all storage servers where a given prefix name exists
 Collecting - Verify storage server metric CD_IO_ST_RQ
 Collecting - Verify the percent of available celldisk space used by the griddisks
 Collecting - Verify there are no griddisks configured on flash memory devices
 Collecting - Verify the currently active image status [Storage Server]


Data collections completed. Checking best practices on dm01dbadm01.
------------------------------------------------------------



 FAIL =>     One or more storage servers have stateful alerts that have not been cleared.
 FAIL =>     One or more database servers have stateful alerts that have not been cleared
 CRITICAL => Active system values should match those defined in configuration file "cell.conf"
 INFO =>     Oracle GoldenGate failure prevention best practices
 INFO =>     One or more non-default AWR baselines should be created for dw
 WARNING =>  One or more open PDBs do not have non-default services defined for dw
 FAIL =>     One or more of SYSTEM, SYSAUX, USERS, TEMP tablespaces are not of type bigfile for dw
 INFO =>     Please refer to data and guidance provided for database parameter processes for dw
 WARNING =>  filesystemio_options is not set to recommended value on dw1 instance
 WARNING =>  Key InfiniBand fabric error counters should not be present
 FAIL =>     One or more log archive destination and alternate log archive destination settings are not as recommended for dw
 FAIL =>     Table AUD$[FGA_LOG$] should use Automatic Segment Space Management for dw
 FAIL =>     Database parameter DB_LOST_WRITE_PROTECT is not set to recommended value on dw1 instance
 FAIL =>     Database parameter OS_AUTHENT_PREFIX is not set to recommended value on dw1 instance
 CRITICAL => Database parameter USE_LARGE_PAGES is not set to recommended value on dw1 instance
 CRITICAL => Database parameter CLUSTER_INTERCONNECTS is not set to the recommended value for dw
 CRITICAL => One or more Ethernet network cables are not connected.
 WARNING =>  Database parameter DB_BLOCK_CHECKING on primary is not set to the recommended value. for dw
 FAIL =>     Flashback on primary is not configured for dw
 INFO =>     Operational Best Practices
 INFO =>     Database Consolidation Best Practices
 INFO =>     Computer failure prevention best practices
 INFO =>     Data corruption prevention best practices
 INFO =>     Logical corruption prevention best practices
 INFO =>     Database/Cluster/Site failure prevention best practices
 INFO =>     Client failover operational best practices
 WARNING =>  fast_start_mttr_target should be greater than or equal to 300 on dw1 instance
 FAIL =>     Database control files are not configured as recommended for dw
 INFO =>     Database failure prevention best practices
 WARNING =>  Database Archivelog Mode should be set to ARCHIVELOG for dw
 FAIL =>     Primary database is not protected with Data Guard (standby database) for real-time data protection and availability for dw
 INFO =>     Storage failures prevention best practices
 INFO =>     Software maintenance best practices
 INFO =>     Oracle recovery manager(rman) best practices
 WARNING =>  RMAN controlfile autobackup should be set to ON for dw
 INFO =>     Exadata Critical Issues (Doc ID 1270094.1):- DB1-DB4,DB6,DB9-DB47, EX1-EX63 and IB1-IB3,IB5-IB8
Collecting patch inventory on CRS_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ASM_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1



Copying results from dm01dbadm02 and generating report. This might take a while. Be patient.


============================================================
            Node name - dm01dbadm02
============================================================



 Collecting - ASM Disk Group for Infrastructure Software and Configuration
 Collecting - ASM Diskgroup Attributes
 Collecting - ASM diskgroup usable free space
 Collecting - ASM initialization parameters
 Collecting - Database Parameters for dw database
 Collecting - Database Undocumented Parameters for dw database
 Collecting - RDBMS Feature Usage for dw database
 Collecting - CPU Information
 Collecting - Clusterware and RDBMS software version
 Collecting - Compute node PCI bus slot speed for infiniband HCAs
 Collecting - Kernel parameters
 Collecting - Maximum number of semaphore sets on system
 Collecting - Maximum number of semaphores on system
 Collecting - OS Packages
 Collecting - Patches for Grid Infrastructure
 Collecting - Patches for RDBMS Home
 Collecting - RDBMS patch inventory
 Collecting - number of semaphore operations per semop system call


Data collections completed. Checking best practices on dm01dbadm02.
------------------------------------------------------------



 FAIL =>     One or more database servers have stateful alerts that have not been cleared
 INFO =>     Oracle GoldenGate failure prevention best practices
 WARNING =>  One or more open PDBs do not have non-default services defined for dw
 FAIL =>     One or more of SYSTEM, SYSAUX, USERS, TEMP tablespaces are not of type bigfile for dw
 INFO =>     Please refer to data and guidance provided for database parameter processes for dw
 WARNING =>  filesystemio_options is not set to recommended value on dw2 instance
 FAIL =>     One or more log archive destination and alternate log archive destination settings are not as recommended for dw
 FAIL =>     Database parameter DB_LOST_WRITE_PROTECT is not set to recommended value on dw2 instance
 FAIL =>     Database parameter OS_AUTHENT_PREFIX is not set to recommended value on dw2 instance
 CRITICAL => Database parameter USE_LARGE_PAGES is not set to recommended value on dw2 instance
 CRITICAL => Database parameter CLUSTER_INTERCONNECTS is not set to the recommended value for dw
 CRITICAL => One or more Ethernet network cables are not connected.
 WARNING =>  Database parameter DB_BLOCK_CHECKING on primary is not set to the recommended value. for dw
 WARNING =>  fast_start_mttr_target should be greater than or equal to 300 on dw2 instance
 FAIL =>     Database control files are not configured as recommended for dw
Collecting patch inventory on CRS_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ASM_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1




------------------------------------------------------------
                      CLUSTERWIDE CHECKS
------------------------------------------------------------

 FAIL =>     Time services are not properly configured
------------------------------------------------------------
Detailed report (html) -  /u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output/exachk_dm01dbadm01_dw_082721_115456/exachk_dm01dbadm01_dw_082721_115456.html
root@dm01dbadm02's password: 
root@dm01dbadm02's password: 





UPLOAD [if required] - /u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output/exachk_dm01dbadm01_dw_082721_115456.zip




[root@dm01dbadm01 ~]# cd /u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output
[root@dm01dbadm01 output]# 
[root@dm01dbadm01 output]# 

二、导出样例:

在这里插入图片描述
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

DBA狗剩儿

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值