0. 先认识一下Oracle10g RAC的一些服务及概念
Cluster Synchronization Services (CSS)—
Manages the cluster configuration by controlling which nodes are members of the
cluster and by notifying members when a node joins or leaves the cluster. If
you are using third-party clusterware, then the css process interfaces with your
clusterware to manage node membership information.
Cluster Ready Services (CRS)—
The primary program for managing high availability operations within a cluster.
Anything that the crs process manages is known as a cluster resource which could
be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an
application process, and so on. The crs process manages cluster resources based on
the resource's configuration information that is stored in the OCR. This includes
start, stop, monitor and failover operations. The crs process generates events when
a resource status changes. When you have installed Oracle RAC, crs monitors the Oracle
instance, Listener, and so on, and automatically restarts these components when a failure
occurs. By default, the crs process makes five attempts to restart a resource and then
does not make further restart attempts if the resource does not restart.
Event Management (EVM):
A background process that publishes events that crs creates.
Oracle Notification Service (ONS):
A publish and subscribe service for communicating Fast Application Notification
(FAN) events.
RACG—
Extends clusterware to support Oracle-specific requirements and complex resources.
Runs server callout scripts when FAN events occur.
Process Monitor Daemon (OPROCD):
This process is locked in memory to monitor the cluster and provide I/O fencing.
OPROCD performs its check, stops running, and if the wake up is beyond the expected
time, then OPROCD resets the processor and reboots the node. An OPROCD failure results
in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux
platforms.
Voting Disk :
Manages cluster membership by way of a health check and arbitrates cluster
ownership among the instances in case of network failures. Oracle RAC uses the
voting disk to determine which instances are members of a cluster. The voting disk
must reside on shared disk. For high availability, Oracle recommends that you have
multiple voting disks. The Oracle Clusterware enables multiple voting disks but you
must have an odd number of voting disks, such as three, five, and so on. If you define
a single voting disk, then you should use external mirroring to provide redundancy.
Oracle Cluster Registry (OCR):
Maintains cluster configuration information as well as configuration information
about any cluster database within the cluster. The OCR also manages information about
processes that Oracle Clusterware controls. The OCR stores configuration information
in a series of key-value pairs within a directory tree structure. The OCR must reside
on shared disk that is accessible by all of the nodes in your cluster. The Oracle
Clusterware can multiplex the OCR and Oracle recommends that you use this feature
to ensure cluster high availability. You can replace a failed OCR online, and you can
update the OCR through supported APIs such as Enterprise Manager, the Server Control
Utility (SRVCTL), or the Database Configuration Assistant (DBCA).
CRS主要service --
crs主要进程
(1)crsd - 负责管理HA操作, 管理crs资源,如linstener,vip,ons,gsn等,由root用户管理、启动
(2)ocssd - 管理各节点的关系,用于节点间通信, 由oracle用户运行管理
(3)oprocd - 集群进程管理 —Process monitor for the cluster. 仅在没有使用vendor的集群软件状态下运行
(4)evmd - 事件检测进程,由oracle用户运行管理
相关log位置
$ORA_CRS_HOME/log/nodename/crsd
$ORA_CRS_HOME/crs/init
$ORA_CRS_HOME/css/log
$ORA_CRS_HOME/css/init
$ORA_CRS_HOME/evm/log
$ORA_CRS_HOME/evm/init
$ORA_CRS_HOME/srvm/log
1. 这里 ORACLE_BASE=/u01/product , ORACLE_HOME=/u01/product/oracle
mxrac05$ls
adump bdump cdump dpdump hdump pfile udump
A. adump 记录的是aud后缀的审计文件,记录SYS用户的登陆信息 。
Audit file /u01/product/admin/mxdell/adump/ora_24065.aud
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/product/oracle
System name: Linux
Node name: mxrac05
Release: 2.6.18-128.el5
Version: #1 SMP Wed Dec 17 11:41:38 EST 2008
Machine: x86_64
Instance name: mxdell5
Redo thread mounted by this instance: 5
Oracle process number: 54
Unix process pid: 24065, image: oracle@mxrac05
Mon Sep 27 14:09:34 2010
LENGTH : '153'
ACTION :[7] 'CONNECT'
DATABASE USER:[3] 'SYS'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[12] 'harrison.han'
CLIENT TERMINAL:[11] 'MXWS-004570'
STATUS:[1] '0'
B. bdump 记录的是所有后台进程相关的trace文件及各实例的alert log文件
比如其中 alert_mxdell1.log 表示记录RAC节点1实例 mxdell1 (实例名称)对应的
告警日志文件及对应后台进程的trc文件;其中还有一些目录比如cdmp_20101005101745
下有一些trw文件,也是一种trace文件,一般出现这种文件,都会在alert log中
找到对应的错误日志, 比如Tue Jul 13 22:01:16 2010 Trace dumping is performing
id=[cdmp_20100713220116], alert log中这些错误会生成含有时间戳的核心转储文件
bdump/cdmp_timestamp, 其中timestamp表示错误发生的时间,一般出现core dump基本
都是bug导致 。
The directory cdmp_timestamp contains in-memory traces of Oracle RAC instance
failure information
Diagnosability Daemon (DIAG)
The Diagnosability Daemon captures diagnostic information related to process and
instance failures. This information can be used Oracle World Wide Support to help
and analyze and resolve problems with your database and instances.
The DIAG process writes its diagnostic information to files in a subdirectory of
the directory specified by the initialization parameter BACKGROUN_DUMP_DEST.The
subdirectories are named cdmp_timestamp, where timestatmp identifies when the
subdirectory, and trace information, was written.
例子:
mxrac01$ls -alhrt
total 4.1M
drwxr-xr-x 9 oracle dba 4.0K Mar 2 2010 ..
drwxr-x--- 2 oracle dba 24K Sep 17 10:59 cdmp_20100917105859
drwxr-x--- 2 oracle dba 24K Oct 1 22:14 cdmp_20101001221411
drwxr-x--- 2 oracle dba 24K Oct 5 10:17 cdmp_20101005101745
-rw-rw---- 1 oracle dba 1.1K Nov 12 23:00 mxdell1_m001_2876.trc
-rw-rw---- 1 oracle dba 1.1K Nov 13 19:00 mxdell1_m001_8809.trc
-rw-rw---- 1 oracle dba 1.1K Nov 13 21:00 mxdell1_m001_28585.trc
-rw-rw---- 1 oracle dba 964 Nov 14 17:00 mxdell1_m001_15713.trc
-rw-rw---- 1 oracle dba 773 Nov 14 17:10 mxdell1_q002_19652.trc
-rw-rw---- 1 oracle dba 976 Nov 14 17:12 mxdell1_arc0_13125.trc
-rw-rw---- 1 oracle dba 68K Nov 14 17:29 mxdell1_diag_12986.trc
-rw-rw---- 1 oracle dba 984 Nov 14 18:00 mxdell1_m001_7499.trc
-rw-rw---- 1 oracle dba 747K Nov 14 18:51 mxdell1_arc1_13127.trc
-rw-rw---- 1 oracle dba 1.1K Nov 14 19:00 mxdell1_m001_1103.trc
-rw-rw---- 1 oracle dba 432K Nov 15 21:41 mxdell1_lmd0_12992.trc
-rw-rw---- 1 oracle dba 1.1K Nov 15 22:00 mxdell1_m001_2387.trc
-rw-rw---- 1 oracle dba 240K Nov 15 22:01 mxdell1_lms3_13006.trc
-rw-rw---- 1 oracle dba 291K Nov 15 22:12 mxdell1_lms4_13011.trc
-rw-rw---- 1 oracle dba 212K Nov 15 22:26 mxdell1_lms1_12998.trc
drwxr-xr-x 5 oracle dba 12K Nov 15 22:33 .
-rw-r----- 1 oracle dba 1.1M Nov 15 22:43 alert_mxdell1.log
-rw-rw---- 1 oracle dba 1.9K Nov 15 22:43 mxdell1_lgwr_13027.trc
-rw-rw---- 1 oracle dba 229K Nov 15 22:46 mxdell1_lms0_12994.trc
-rw-rw---- 1 oracle dba 239K Nov 15 22:47 mxdell1_lms5_13015.trc
-rw-rw---- 1 oracle dba 253K Nov 15 22:47 mxdell1_lms2_13002.trc
核心转储(core dump目录下的trw文件)例子 :
mxrac01$ls -alhrt
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_smon_13801.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_reco_13803.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_qmnc_14006.trw
-rw-rw---- 1 oracle dba 32K Sep 17 10:59 mxdell1_pz99_14013.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_psp0_13754.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_pmon_13748.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_994.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9903.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9720.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9552.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9546.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9541.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9436.trw
-rw-rw---- 1 oracle dba 32K Sep 17 10:59 mxdell1_ora_9420.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_9200.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_10287.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_10090.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_ora_10068.trw
-rw-rw---- 1 oracle dba 38K Sep 17 10:59 mxdell1_mmon_13807.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_mmnl_13809.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_mman_13788.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lms5_13784.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lms4_13780.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lms3_13776.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lms2_13772.trw
-rw-rw---- 1 oracle dba 32K Sep 17 10:59 mxdell1_lms1_13766.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lms0_13760.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lmon_13756.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_lmd0_13758.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_lgwr_13797.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_lck0_13823.trw
-rw-rw---- 1 oracle dba 10K Sep 17 10:59 mxdell1_j005_4826.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_j002_7331.trw
-rw-rw---- 1 oracle dba 30K Sep 17 10:59 mxdell1_j001_10548.trw
-rw-rw---- 1 oracle dba 32K Sep 17 10:59 mxdell1_j000_10521.trw
-rw-rw---- 1 oracle dba 6.0K Sep 17 10:59 mxdell1_diag_13750.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_dbw2_13795.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_dbw1_13793.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_dbw0_13790.trw
-rw-rw---- 1 oracle dba 38K Sep 17 10:59 mxdell1_ckpt_13799.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_cjq0_13805.trw
-rw-rw---- 1 oracle dba 36K Sep 17 10:59 mxdell1_arc1_13937.trw
-rw-rw---- 1 oracle dba 34K Sep 17 10:59 mxdell1_arc0_13935.trw
mxrac01$crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.mxdell.db application ONLINE ONLINE mxrac01
ora....l1.inst application ONLINE ONLINE mxrac01
ora....l3.inst application ONLINE ONLINE mxrac03
ora....l4.inst application ONLINE ONLINE mxrac04
ora....l5.inst application ONLINE ONLINE mxrac05
ora....01.lsnr application ONLINE ONLINE mxrac01
ora....c01.gsd application ONLINE ONLINE mxrac01
ora....c01.ons application ONLINE ONLINE mxrac01
ora....c01.vip application ONLINE ONLINE mxrac01
ora....03.lsnr application ONLINE ONLINE mxrac03
ora....c03.gsd application ONLINE ONLINE mxrac03
ora....c03.ons application ONLINE ONLINE mxrac03
ora....c03.vip application ONLINE ONLINE mxrac03
ora....04.lsnr application ONLINE ONLINE mxrac04
ora....c04.gsd application ONLINE ONLINE mxrac04
ora....c04.ons application ONLINE ONLINE mxrac04
ora....c04.vip application ONLINE ONLINE mxrac04
ora....05.lsnr application ONLINE ONLINE mxrac05
ora....c05.gsd application ONLINE ONLINE mxrac05
ora....c05.ons application ONLINE OFFLINE
ora....c05.vip application ONLINE ONLINE mxrac05
mxrac01$
mxrac01$
mxrac01$
C. cdump 记录很多core_ 开头的目录,core文件是进程的内核映像,用户一般
不用看这些文件 。 core_ 后面的数字表示process ID .
cdump下存放的是oracle内部错误时的内核信息,在bdump或udump中都会有对应的文件。
cdump信息对oracle support很有用。修改参数 core_dump_dest 更改路径 。
mxrac01$ls -alhrt
total 60K
drwxr-x--- 2 oracle dba 4.0K Dec 7 2009 core_2662
drwxr-x--- 2 oracle dba 4.0K Dec 16 2009 core_20943
drwxr-x--- 2 oracle dba 4.0K Dec 21 2009 core_27896
drwxr-x--- 2 oracle dba 4.0K Dec 21 2009 core_23068
drwxr-x--- 2 oracle dba 4.0K Dec 21 2009 core_21673
drwxr-x--- 2 oracle dba 4.0K Dec 21 2009 core_2039
drwxr-x--- 2 oracle dba 4.0K Dec 21 2009 core_11681
drwxr-x--- 2 oracle dba 4.0K Jan 21 2010 core_18290
drwxr-x--- 2 oracle dba 4.0K Jan 22 2010 core_4613
drwxr-x--- 2 oracle dba 4.0K Jan 22 2010 core_18850
drwxr-x--- 2 oracle dba 4.0K Jan 22 2010 core_5644
drwxr-x--- 2 oracle dba 4.0K Feb 16 2010 core_15445
drwxr-xr-x 9 oracle dba 4.0K Mar 2 2010 ..
drwxr-x--- 2 oracle dba 4.0K Aug 8 16:52 core_31833
drwxr-xr-x 15 oracle dba 4.0K Aug 8 16:52 .
mxrac01$
下面文件类似:
mxrac01$ls -alhrt
total 14M
-rw------- 1 oracle dba 16M Dec 21 2009 core.23068
打开这个文件可以看出是二进制文件 。
D. dpdump :是存放一些登录信息的文件。
E. hdump 很少会产生一些记录,表示Oracle High Availability Log Files 。
F. udump :前台手动trace的, 比如sql trace之后session的trace文件
2. CRS相关的服务log (mxrac01是节点1的hostname) .
CRS 目录下的Log
admin => 记录一些概要信息
alertmxrac01.log =>记录节点crs状态变化时候的一些概要信息,详细还是要看css log
client =>记录crs初始化,ocr application including: CLSCFG, CSS, OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG
crsd =>记录crsd的相关日志,crs等待css进入fatal模式后,启动crsd然后启动相关的resource
cssd =>记录cssd的相关日志,节点停止,启动,reconfig等,所有问题都会记录,最重要的日志
evmd =>记录evmd的日志
racg =>记录ons,vip的相关日志
遇到问题一般先看ocssd.log,然后根据时间和需要会查看crsd的日志,所有资源相关的日志都在crsd.log,
另外如果日志看不出关键信息,可以把相关模块日志级别调高(不同版本默认log级别不太一样):
crsctl debug log css CSSD:5
crsctl debug log crs CRSD:3 等
这里每个模块相关的信息可以通过 crsctl lsmodule crs查看
例子:
mxrac01$ls -alh
total 200K
drwxr-xr-x 45 root dba 4.0K Nov 18 2009 .
drwxrwxr-x 6 oracle dba 4.0K Nov 19 2009 ..
drwxr-xr-x 2 root dba 4.0K Feb 24 2010 bin
drwxrwxr-x 4 oracle dba 4.0K Nov 18 2009 cdata
drwxrwxr-x 5 oracle dba 4.0K Apr 2 2010 cfgtoollogs
...
drwxr-xr-x 4 oracle dba 4.0K Nov 18 2009 log
drwxrwx--- 10 oracle dba 4.0K Nov 18 2009 network
drwxrwx--- 5 oracle dba 4.0K Nov 18 2009 nls
....
drwxrwx--- 4 oracle dba 4.0K Nov 18 2009 xdk
mxrac01$ls
admin alertmxrac01.log client crsd cssd evmd racg
mxrac01$ls
crsd.log
mxrac01$ls
cssdOUT.log mxrac01.pid oclsmon oclsomon ocssd.l05 ocssd.log ocssd.trc
mxrac01$ls
evmd.log evmdOUT.log
mxrac01$ls -alh
total 104K
drwxrwxr-t 5 oracle dba 4.0K Nov 15 01:37 .
drwxr-xr-t 8 root dba 4.0K Jan 24 2010 ..
-rw-r--r-- 1 oracle dba 494 Dec 13 2009 evtf.log
-rw-r--r-- 1 oracle dba 2.1K Dec 13 2009 ora.mxdell.db.log
-rw-r--r-- 1 oracle dba 56K Nov 7 02:21 ora.mxrac01.ons.log
-rw-r--r-- 1 root root 4.0K Jun 20 19:04 ora.mxrac01.vip.log
-rw-r--r-- 1 root root 2.4K Jun 20 19:04 ora.mxrac03.vip.log
-rw-r--r-- 1 root root 1.5K Apr 2 2010 ora.mxrac04.vip.log
-rw-r--r-- 1 root root 247 Apr 2 2010 ora.mxrac05.vip.log
drwxrwxrwt 2 oracle dba 4.0K Nov 18 2009 racgeut
drwxrwxrwt 2 oracle dba 4.0K Nov 18 2009 racgevtf
drwxrwxrwt 2 oracle dba 4.0K Nov 18 2009 racgmain
RACG --
mxrac01$ls
admin alertmxrac01.log client crsd cssd evmd racg
在RAC里有在CRS的日志目录里有一个子目录名字RACG, 在此目录下有关于ons,vip和gsd的一些日志
mxrac01$ls
evtf.log ora.mxrac01.ons.log ora.mxrac03.vip.log ora.mxrac05.vip.log racgevtf
ora.mxdell.db.log ora.mxrac01.vip.log ora.mxrac04.vip.log racgeut racgmain
Oracle文档的解释:
RACG—
Extends clusterware to support Oracle-specific requirements and complex resources.
Runs server callout scripts when FAN events occur.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/35489/viewspace-678252/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/35489/viewspace-678252/