前言
Exachk 是Exadata上的健康检查工具,Oracle在Exadata机器上面有它的最佳实践和一些配置的建议值,我们定期使用exachk这个工具收集机器上的系统信息,并结合这些最佳实践和建议值,可以及时发现有哪些潜在的问题。然后把这些隐患消除,最终保障Exadata系统的稳定运行。并且生成报告中会为机器做出一个综合打分,对决策者有一个直观的判断上的帮助!
提示:以下是本篇文章正文内容,下面案例可供参考
一、操作示例过程:
[root@dm01dbadm01 ~]# exachk -v
EXACHK VERSION: 20.2.3_20201012
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]# tfactl version -all
TFA Version : 202300
TFA Build ID : 20201012184854
TFA Build Label : TFA_AHF202_GENERIC_201012.1654
EXACHK VERSION: 20.2.3_20201012
AHF VERSION: 20.2.3
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]# which exachk
/usr/bin/exachk
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]#
[root@dm01dbadm01 ~]# exachk
This version of exachk was released on 12-Oct-2020 and it is older than 180 days. No new version of exachk is available in RAT_UPGRADE_LOC. It is highly recommended that you download the latest version of exachk from my oracle support to ensure the highest level of accuracy of the data contained within the report.
updates.oracle.com is not reachable. Please establish connectivity to updates.oracle.com to download latest version of AHF from MOS patch 30166242 and try again
Running older version...
root@dm01dbadm02's password:
root@dm01dbadm02's password:
Searching for running databases . . . . .
. . . . . . . .
List of running databases registered in OCR
1. ***
2. ****
3. *****
4. ******
5. All of above
6. None of above
Select databases from list for checking best practices. For multiple databases, select 5 for All or comma separated number like 1,2 etc [1-6][5]. 1
Reading storage servers password from wallet.
Wallet does not have password for all cells
root user equivalence is not setup between dm01dbadm01 and storage server dm01cel01-priv2 (192.168.***.***).
1. Enter 1 if you will enter root password for each storage server when prompted.
2. Enter 2 to exit and configure root user equivalence manually and re-run exachk.
3. Enter 3 to skip checking best practices on storage server.
Indicate your selection from one of the above options for storage server[1-3][1]:- 1
Is root password same on all storage server?[y/n][y] y
Enter root password for storage server :-
Verifying root password ...
. . . . . . . . . . . . . . . . . . . . . . .
Reading Infiniband switches password from wallet.
. .
Wallet does not have password for all switches
9 of the included audit checks require root privileged data collection on infiniband switch
1. Enter 1 if you will enter root password for each infiniband switch when prompted
2. Enter 2 to exit and to arrange for root access and run the exachk later.
3. Enter 3 to skip checking best practices on infiniband switch
Indicate your selection from one of the above options for infiniband switch[1-3][1]:- 3
Either Cluster Verification Utility pack (cvupack) does not exist at /opt/oracle.ahf/common/cvu or it is an old or invalid cvupack
Checking Cluster Verification Utility (CVU) version at CRS Home - /u01/app/19.0.0.0/grid
This version of Cluster Verification Utility (CVU) was released on 01-Oct-2020 and it is older than 180 days. It is highly recommended that you download the latest version of CVU from MOS patch 30839369 to ensure the highest level of accuracy of the data contained within the report
updates.oracle.com is not reachable. Please establish connectivity to updates.oracle.com to download Cluster Verification Utility (CVU) pack or download the latest version of CVU from MOS patch 30839369 and copy in /opt/oracle.ahf/common/cvu directory.
Running older version of Cluster Verification Utility (CVU) from CRS Home - /u01/app/19.0.0.0/grid
Do you want to store storage servers password in wallet permanently?[y/n][n] y
Storing storage servers password in wallet...
Starting to run exachk in background on dm01dbadm02 using socket
root@dm01dbadm02's password:
root@dm01dbadm02's password:
This version of exachk was released on 12-Oct-2020 and it is older than 180 days. No new version of exachk is available in RAT_UPGRADE_LOC. It is highly recommended that you download the latest version of exachk from my oracle support to ensure the highest level of accuracy of the data contained within the report.
updates.oracle.com is not reachable. Please establish connectivity to updates.oracle.com to download latest version of AHF from MOS patch 30166242 and try again
Running older version...
. . . .
. .
Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS on dm01dbadm01
. . . . . .
. . . . . . . . . . . . .
-------------------------------------------------------------------------------------------------------
Oracle Stack Status
-------------------------------------------------------------------------------------------------------
Host Name CRS Installed RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name
-------------------------------------------------------------------------------------------------------
dm01dbadm01 Yes Yes Yes Yes Yes dw1
-------------------------------------------------------------------------------------------------------
.
.
. . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
*** Checking Best Practice Recommendations ( Pass / Warning / Fail ) ***
.
Collections and audit checks log file is
/u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output/exachk_dm01dbadm01_dw_082721_115456/log/exachk.log
============================================================
Node name - dm01dbadm01
============================================================
Collecting - ASM Disk Group for Infrastructure Software and Configuration
Collecting - ASM Diskgroup Attributes
Collecting - ASM diskgroup usable free space
Collecting - ASM initialization parameters
Collecting - Database Parameters for dw database
Collecting - Database Undocumented Parameters for dw database
Collecting - RDBMS Feature Usage for dw database
Collecting - CPU Information
Collecting - Clusterware and RDBMS software version
Collecting - Compute node PCI bus slot speed for infiniband HCAs
Collecting - Kernel parameters
Collecting - Maximum number of semaphore sets on system
Collecting - Maximum number of semaphores on system
Collecting - OS Packages
Collecting - Patches for Grid Infrastructure
Collecting - Patches for RDBMS Home
Collecting - RDBMS patch inventory
Collecting - number of semaphore operations per semop system call
Collecting - CRS user limits configuration
Collecting - CRS user time zone check
Collecting - Check alerthistory for non-test open stateless alerts [Database Server]
Collecting - Check alerthistory for stateful alerts not cleared [Database Server]
Collecting - Clusterware patch inventory
Collecting - Discover switch type(spine or leaf)
Collecting - Exadata Critical Issue DB09
Collecting - Exadata Critical Issue EX30
Collecting - Exadata Critical Issue EX56
Collecting - Exadata Critical Issue EX57
Collecting - Exadata Critical Issue EX58
Collecting - Exadata critical issue EX48
Collecting - Exadata critical issue EX50
Collecting - Exadata critical issue EX55
Collecting - Exadata software version on database server
Collecting - Exadata system model number
Collecting - Exadata version on database server
Collecting - HCA firmware version on database server
Collecting - HCA transfer rate on database server
Collecting - Infrastructure Software and Configuration for compute
Collecting - MaxStartups setting in sshd_config
Collecting - OFED Software version on database server
Collecting - Obtain hardware information
Collecting - Operating system and Kernel version on database server
Collecting - Oracle monitoring agent and/or OS settings on ADR diagnostic directories
Collecting - Raid controller bus link speed
Collecting - Review Non-Exadata components in use on the InfiniBand fabric
Collecting - System Event Log
Collecting - Validate key sysctl.conf parameters on database servers
Collecting - Verify Data Network is Separate from Management Network
Collecting - Verify Database Server Disk Controller Configuration
Collecting - Verify Database Server Physical Drive Configuration
Collecting - Verify Database Server Virtual Drive Configuration
Collecting - Verify Disk Cache Policy on database server
Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Database Server]
Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON
Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE
Collecting - Verify IP routing configuration on database servers
Collecting - Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers
Collecting - Verify Master (Rack) Serial Number is Set [Database Server]
Collecting - Verify Quorum disks configuration
Collecting - Verify RAID Controller Battery Temperature [Database Server]
Collecting - Verify RAID disk controller CacheVault capacitor condition [Database Server]
Collecting - Verify active kernel version matches expected version for installed Exadata Image
Collecting - Verify available ksplice fixes are installed [Database Server]
Collecting - Verify basic Logical Volume(LVM) system devices configuration
Collecting - Verify database server InfiniBand network MTU size
Collecting - Verify database server disk controllers use writeback cache
Collecting - Verify database server file systems have Maximum mount count = -1
Collecting - Verify imageinfo on database server
Collecting - Verify imageinfo on database server to compare systemwide
Collecting - Verify installed rpm(s) kernel type match the active kernel version
Collecting - Verify key InfiniBand fabric error counters are not present
Collecting - Verify no database server kernel out of memory errors
Collecting - Verify proper ACFS drivers are installed for Spectre v2 mitigation
Collecting - Verify service exachkcfg autostart status on database server
Collecting - Verify the InfiniBand Fabric Topology (verify-topology)
Collecting - Verify the Master Subnet Manager is running on an InfiniBand switch
Collecting - Verify the Name Service Cache Daemon (NSCD) configuration
Collecting - Verify the Subnet Manager is properly disabled [Database Server]
Collecting - Verify the currently active image status [Database Server]
Collecting - Verify the ib_sdp module is not loaded into the kernel
Collecting - Verify the storage servers in use configuration matches across the cluster
Collecting - collect time server data [Database Server]
Collecting - root time zone check
Collecting - verify asr exadata configuration check via ASREXACHECK on database server
Starting to run root privileged commands in background on storage server dm01cel01 (192.168.***.***)
Starting to run root privileged commands in background on storage server dm01cel02 (192.168.***.***)
Starting to run root privileged commands in background on storage server dm01cel03 (192.168.***.***)
Collections from storage server:
------------------------------------------------------------
Collecting - Exadata Critical Issue EX22
Collecting - Exadata Critical Issue EX10
Collecting - Exadata Critical Issue EX11
Collecting - Exadata Critical Issue EX28
Collecting - Exadata Critical Issue EX45
Collecting - Exadata Critical Issue EX31
Collecting - Exadata Critical Issue EX54
Collecting - Exadata Critical Issue EX57
Collecting - Exadata Critical Issue EX58
Collecting - Exadata critical issue EX16
Collecting - Exadata critical issue EX37
Collecting - Exadata critical issue EX14
Collecting - Exadata critical issue EX48
Collecting - Exadata critical issue EX47
Collecting - Exadata software version on storage server
Collecting - Exadata software version on storage servers
Collecting - Exadata critical issue EX51
Collecting - Infrastructure Software and Configuration for storage
Collecting - Exadata storage server system model number
Collecting - Verify Exadata Smart Flash Cache is created
Collecting - Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile) [Storage Server]
Collecting - RAID controller version on storage servers
Collecting - Verify Disk Cache Policy on storage servers
Collecting - Verify ILOM Power Up Configuration for HOST_LAST_POWER_STATE on storage servers
Collecting - Verify ILOM Power Up Configuration for HOST_AUTO_POWER_ON on storage servers
Collecting - Verify OSSCONF/cellinit.ora consistency across storage servers
Collecting - Verify Master (Rack) Serial Number is Set [Storage Server]
Collecting - Verify RAID Controller Battery Temperature [Storage Server]
Collecting - Verify RAID disk controller CacheVault capacitor condition [Storage Server]
Collecting - Verify active system values match those defined in configuration file cell.conf [Storage Server]
Collecting - Verify Storage Server user CELLDIAG exists
Collecting - Verify available ksplice fixes are installed [Storage Server]
Collecting - Verify data (non-system) disks on Exadata Storage Servers have no partitions
Collecting - Verify imageinfo on storage server to compare systemwide
Collecting - Verify imageinfo on storage server
Collecting - Verify service exachkcfg autostart status on storage server
Collecting - Verify release tracking bug on storage servers
Collecting - Verify the Subnet Manager is properly disabled [Storage Server]
Collecting - Verify there are no files present that impact normal firmware update procedures [Storage Server]
Collecting - collect time server data [Storage Server]
Collecting - Check alerthistory for non-test open stateless alerts [Storage Server]
Collecting - collect storage server flashcachemode data
Collecting - verify asr exadata configuration check via ASREXACHECK on storage servers
Collecting - Check alerthistory for stateful alerts not cleared [Storage Server]
Collecting - OFED Software version on storage server
Collecting - Determine storage server type(All Flash/High Capacity)
Collecting - Exadata Celldisk predictive failures
Collecting - Exadata storage server root filesystem free space
Collecting - HCA firmware version on storage server
Collecting - Verify Datafiles are Placed on Diskgroups consisting of griddisks with correct attributes
Collecting - Verify Exadata Smart Flash Cache is actually in use
Collecting - Verify Ethernet Cable Connection Quality on storage servers
Collecting - Verify ExaWatcher is executing [Storage Server]
Collecting - Operating system and Kernel version on storage server
Collecting - Storage server make and model
Collecting - Verify Data Network is Separate from Management Network on storage server
Collecting - Storage server flash cache mode
Collecting - Verify Exadata Smart Flash Cache status is normal
Collecting - Verify Exadata Smart Flash Log is Created
Collecting - Verify InfiniBand Cable Connection Quality on storage servers
Collecting - Verify average ping times to DNS nameserver [Storage Server]
Collecting - Verify griddisk count matches across all storage servers where a given prefix name exists
Collecting - Verify storage server metric CD_IO_ST_RQ
Collecting - Verify the percent of available celldisk space used by the griddisks
Collecting - Verify there are no griddisks configured on flash memory devices
Collecting - Verify the currently active image status [Storage Server]
Data collections completed. Checking best practices on dm01dbadm01.
------------------------------------------------------------
FAIL => One or more storage servers have stateful alerts that have not been cleared.
FAIL => One or more database servers have stateful alerts that have not been cleared
CRITICAL => Active system values should match those defined in configuration file "cell.conf"
INFO => Oracle GoldenGate failure prevention best practices
INFO => One or more non-default AWR baselines should be created for dw
WARNING => One or more open PDBs do not have non-default services defined for dw
FAIL => One or more of SYSTEM, SYSAUX, USERS, TEMP tablespaces are not of type bigfile for dw
INFO => Please refer to data and guidance provided for database parameter processes for dw
WARNING => filesystemio_options is not set to recommended value on dw1 instance
WARNING => Key InfiniBand fabric error counters should not be present
FAIL => One or more log archive destination and alternate log archive destination settings are not as recommended for dw
FAIL => Table AUD$[FGA_LOG$] should use Automatic Segment Space Management for dw
FAIL => Database parameter DB_LOST_WRITE_PROTECT is not set to recommended value on dw1 instance
FAIL => Database parameter OS_AUTHENT_PREFIX is not set to recommended value on dw1 instance
CRITICAL => Database parameter USE_LARGE_PAGES is not set to recommended value on dw1 instance
CRITICAL => Database parameter CLUSTER_INTERCONNECTS is not set to the recommended value for dw
CRITICAL => One or more Ethernet network cables are not connected.
WARNING => Database parameter DB_BLOCK_CHECKING on primary is not set to the recommended value. for dw
FAIL => Flashback on primary is not configured for dw
INFO => Operational Best Practices
INFO => Database Consolidation Best Practices
INFO => Computer failure prevention best practices
INFO => Data corruption prevention best practices
INFO => Logical corruption prevention best practices
INFO => Database/Cluster/Site failure prevention best practices
INFO => Client failover operational best practices
WARNING => fast_start_mttr_target should be greater than or equal to 300 on dw1 instance
FAIL => Database control files are not configured as recommended for dw
INFO => Database failure prevention best practices
WARNING => Database Archivelog Mode should be set to ARCHIVELOG for dw
FAIL => Primary database is not protected with Data Guard (standby database) for real-time data protection and availability for dw
INFO => Storage failures prevention best practices
INFO => Software maintenance best practices
INFO => Oracle recovery manager(rman) best practices
WARNING => RMAN controlfile autobackup should be set to ON for dw
INFO => Exadata Critical Issues (Doc ID 1270094.1):- DB1-DB4,DB6,DB9-DB47, EX1-EX63 and IB1-IB3,IB5-IB8
Collecting patch inventory on CRS_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ASM_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1
Copying results from dm01dbadm02 and generating report. This might take a while. Be patient.
============================================================
Node name - dm01dbadm02
============================================================
Collecting - ASM Disk Group for Infrastructure Software and Configuration
Collecting - ASM Diskgroup Attributes
Collecting - ASM diskgroup usable free space
Collecting - ASM initialization parameters
Collecting - Database Parameters for dw database
Collecting - Database Undocumented Parameters for dw database
Collecting - RDBMS Feature Usage for dw database
Collecting - CPU Information
Collecting - Clusterware and RDBMS software version
Collecting - Compute node PCI bus slot speed for infiniband HCAs
Collecting - Kernel parameters
Collecting - Maximum number of semaphore sets on system
Collecting - Maximum number of semaphores on system
Collecting - OS Packages
Collecting - Patches for Grid Infrastructure
Collecting - Patches for RDBMS Home
Collecting - RDBMS patch inventory
Collecting - number of semaphore operations per semop system call
Data collections completed. Checking best practices on dm01dbadm02.
------------------------------------------------------------
FAIL => One or more database servers have stateful alerts that have not been cleared
INFO => Oracle GoldenGate failure prevention best practices
WARNING => One or more open PDBs do not have non-default services defined for dw
FAIL => One or more of SYSTEM, SYSAUX, USERS, TEMP tablespaces are not of type bigfile for dw
INFO => Please refer to data and guidance provided for database parameter processes for dw
WARNING => filesystemio_options is not set to recommended value on dw2 instance
FAIL => One or more log archive destination and alternate log archive destination settings are not as recommended for dw
FAIL => Database parameter DB_LOST_WRITE_PROTECT is not set to recommended value on dw2 instance
FAIL => Database parameter OS_AUTHENT_PREFIX is not set to recommended value on dw2 instance
CRITICAL => Database parameter USE_LARGE_PAGES is not set to recommended value on dw2 instance
CRITICAL => Database parameter CLUSTER_INTERCONNECTS is not set to the recommended value for dw
CRITICAL => One or more Ethernet network cables are not connected.
WARNING => Database parameter DB_BLOCK_CHECKING on primary is not set to the recommended value. for dw
WARNING => fast_start_mttr_target should be greater than or equal to 300 on dw2 instance
FAIL => Database control files are not configured as recommended for dw
Collecting patch inventory on CRS_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ASM_HOME /u01/app/19.0.0.0/grid
Collecting patch inventory on ORACLE_HOME /u01/app/oracle/product/19.0.0.0/dbhome_1
------------------------------------------------------------
CLUSTERWIDE CHECKS
------------------------------------------------------------
FAIL => Time services are not properly configured
------------------------------------------------------------
Detailed report (html) - /u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output/exachk_dm01dbadm01_dw_082721_115456/exachk_dm01dbadm01_dw_082721_115456.html
root@dm01dbadm02's password:
root@dm01dbadm02's password:
UPLOAD [if required] - /u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output/exachk_dm01dbadm01_dw_082721_115456.zip
[root@dm01dbadm01 ~]# cd /u01/app/grid/oracle.ahf/data/dm01dbadm01/exachk/user_root/output
[root@dm01dbadm01 output]#
[root@dm01dbadm01 output]#
二、导出样例: