记一次oracle宕机事件


这周又出了一次生产事件,发生在另一个运营了近四年的云环境中,受影响客户较多,好在影响时间不长,较快的恢复了生产。具体的排查过程就不说了,在这里先做一下排查记录。

服务器message日志

根据此日志文件可以查看oracle是因为什么宕掉的,这次事件通过此文件发现,是因为free swap 为0KB 引发了系统主动kill了oracle进程

[root@57373ded4b19 log]# more /var/log/messages 

应用日志

app日志

查询DB宕机时间段的应用状态

web-apache日志

统计应用的交易请求量

oracle日志

alert*.log 是oracle的警告日志文件,能够看出来出问题的时候oracle在做什么,是因为什么引发的问题产生

trace 日志

[oracle@57373ded4b19 trace]$ sqlplus / as sysdba

SQL>  show parameter dump

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
background_core_dump                 string      partial
background_dump_dest                 string      /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
core_dump_dest                       string      /home/oracle/app/oracle/diag/rdbms/helowin/helowin/cdump
max_dump_file_size                   string      unlimited
shadow_core_dump                     string      partial
user_dump_dest                       string      /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
SQL> select * from v$diag_info;

   INST_ID NAME                                                             VALUE
---------- ---------------------------------------------------------------- -------------------------------------------------------------------------------------
         1 Diag Enabled                                                     TRUE
         1 ADR Base                                                         /home/oracle/app/oracle
         1 ADR Home                                                         /home/oracle/app/oracle/diag/rdbms/helowin/helowin
         1 Diag Trace                                                       /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
         1 Diag Alert                                                       /home/oracle/app/oracle/diag/rdbms/helowin/helowin/alert
         1 Diag Incident                                                    /home/oracle/app/oracle/diag/rdbms/helowin/helowin/incident
         1 Diag Cdump                                                       /home/oracle/app/oracle/diag/rdbms/helowin/helowin/cdump
         1 Health Monitor                                                   /home/oracle/app/oracle/diag/rdbms/helowin/helowin/hm
         1 Default Trace File                                               /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace/helowin_ora_25878.trc
         1 Active Problem Count                                             1
         1 Active Incident Count                                            26

11 rows selected.

SQL>
[oracle@57373ded4b19 alert]$ cd /home/oracle/app/oracle/diag/rdbms/helowin/helowin/alert
[oracle@57373ded4b19 alert]$ ls
log.xml
[oracle@57373ded4b19 alert]$ cd /home/oracle/app/oracle/diag/rdbms/helowin/helowin/trace
[oracle@57373ded4b19 trace]$ ls alert_helowin.log 
alert_helowin.log
[oracle@57373ded4b19 trace]$

ASM日志

[oracle@57373ded4b19 trace]$ sqlplus / as sysasm


SQL> show parameter dump

NAME     TYPE VALUE
------------------------------------ ----------- ------------------------------
background_core_dump     string partial
background_dump_dest     string /home/oracle/app/grid/diag/asm/+asm/+A
SM1/trace
core_dump_dest     string /home/oracle/app/grid/diag/asm/+asm/+A
SM1/cdump
max_dump_file_size     string unlimited
shadow_core_dump     string partial
user_dump_dest     string /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace
SQL> select * from v$diag_info;

   INST_ID NAME    VALUE
---------- ---------------------------------------------------------------- -------------------------------------------------------------------------------------
1 Diag Enabled    TRUE
1 ADR Base    /home/oracle/app/grid
1 ADR Home    /home/oracle/app/grid/diag/asm/+asm/+ASM1
1 Diag Trace    /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace
1 Diag Alert    /home/oracle/app/grid/diag/asm/+asm/+ASM1/alert
1 Diag Incident    /home/oracle/app/grid/diag/asm/+asm/+ASM1/incident
1 Diag Cdump    /home/oracle/app/grid/diag/asm/+asm/+ASM1/cdump
1 Health Monitor    /home/oracle/app/grid/diag/asm/+asm/+ASM1/hm
1 Default Trace File    /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11192.trc
1 Active Problem Count    0
1 Active Incident Count    0

11 rows selected.

[oracle@57373ded4b19 trace]$ cd /home/oracle/app/grid/diag/asm/+asm/+ASM1/trace
[oracle@57373ded4b19 trace]$ more alert_+ASM1.log

oracle 导出AWR

sqlplus / as sysdba

SQL> @?/rdbms/admin/awrrpt.sql

然后根据提示输入:

  • 导出文件类型

‘html’ HTML format (default)
‘text’ Text format
‘active-html’ Includes Performance Hub active report

  • 导出的AWR报告天数
  • 根据提示输入开始和结束时间点的 Snap Id
  • 输入导出的文件名称
    即可导出AWR 报告。

每一次生产问题的排查与解决都是从成堆的日志文件中的不知道多少行的多少字符中筛选那么一点信息,去比对定位。
*哎,天天脑壳疼 *

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值