【与达梦同行】数据库coredump的几种常用生成方式+dmrdc使用

一、简介

DM 实例故障,即数据库进程 dmserver 出现异常,表现为异常中止,进程存在但无响应或者无法登录的状态,出现此类问题都属于比较严重的故障,一般情况下我们需要尽可能的收集到所需要的信息进行故障分析,这里我们将把故障分为两类,来讲下出现类似场景时需要收集哪些有价值内容供后续处理。
先简单说明下我们可能会用到的工具和一些术语:

  1. core 文件:程序异常时操作系统保留的完整进程的内存镜像文件。
  2. gdb:用于调试执行程序或者 core 文件的工具。
  3. 堆栈:程序执行中的运行情况,详细包含了运行时函数调用数据以及数据相关信息。
  4. dmrdc:DM 数据库提供的自带对 core 文件进行简单分析的小工具,以 core 文件作为输入参数,dmrdc 可以从 core 文件中读出所有异常时活动会话上的 SQL 语句信息。

前提

需要提前配置好服务器的Core的生成规则:配置core生成规则

二、宕掉DM生成Core

设置环境变量ulimit -c unlimited,程序崩溃后就会在当前目录下生成core文件

  1. 执行以下sql语句,创建一张测试表,并且插入一些数据,保证有较长的执行时间。
CREATE TABLE TESTCOER(ID INT);
INSERT INTO TESTCOER SELECT LEVEL FROM DUAL CONNECT BY LEVEL <10000500;
  1. 使用以下命令查询数据库进程PID,并且使用kill -11强杀进程
-- 命令查询数据库进程PID,并且使用kill -11强杀进程。core文件就会指定的目录下生成

[root@VM-24-17-centos bin]# ps -ef|grep dms
dmdba    2297221       1  8 20:16 pts/0    00:00:04 /opt/dmdbms/bin/dmserver path=/opt/dmdata/DAMENG/dm.ini -noconsole
root     2297413 2289093  0 20:17 pts/0    00:00:00 grep --color=auto dms
[root@VM-24-17-centos bin]# kill -11 2297221

//查找生成的core文件
[root@VM-24-17-centos bin]# ll -lht core.*
-rw------- 1 dmdba dinstall 4033536000 418 20:17 core.2297221

三、GDB分析已有Core文件+dmrdc解析

当数据库异常中断宕机产生core文件,通过GDB分析core文件来判断造成数据库宕机的原因。

// 1. 查找生成的core文件
[dmdba@VM-0-17-centos bin]$ ll -lht core.*
-rw------- 1 dmdba dinstall 1.4G 324 15:49 core.12593
// 2. gdb读取core文件
[dmdba@VM-0-17-centos bin]$ gdb ./dmserver core.12593 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

// 3. 定义存储堆栈的文件名
(gdb) set logging file core_12593.txt
(gdb) set logging on
Copying output to core_12593.txt.

// 4. 记录当前所有崩溃线程堆栈
(gdb)  thread apply all bt
Thread 67 (Thread 0x7f2e3e816700 (LWP 18366)):

// 5. 关闭向文件中写入
(gdb) set logging off
Done logging to core_12593.txt.

// 6.记录当前崩溃线程堆栈
(gdb) bt
#0  0x000000000007875e in ?? ()
#1  0x00007f2ea574c9c9 in CRYPTO_THREAD_run_once () from ./libcrypto.so
#2  0x00007f2ea5712401 in RAND_get_rand_method () from ./libcrypto.so
#3  0x00007f2ea571268d in RAND_add () from ./libcrypto.so
#4  0x0000000000879e0d in dm_dh_gen_respective_key ()
#5  0x000000000150dd72 in ntsk_process_startup ()
#6  0x000000000150fcde in ntsk_process_cop ()
#7  0x0000000001415c80 in uthr_db_main_for_sess ()
#8  0x00007f2ea7f9cea5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f2ea74b7b0d in clone () from /lib64/libc.so.6

// 7. 记录当前崩溃线程号
//输入 info threads 记录当前崩溃线程号:
(gdb)  info threads
//备注 
//前面有*为当前线程,LWP后面为线程号
Id   Target Id                           Frame 
* 1    Thread 0x7f7daae63740 (LWP 2297221) 0x000000000155a117 in assert_fun ()
2    Thread 0x7f7d27b4f700 (LWP 2297225) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3    Thread 0x7f7d27237700 (LWP 2297228) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4    Thread 0x7f7d27338700 (LWP 2297227) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5    Thread 0x7f7c4bfff700 (LWP 2297309) 0x00007f7da9c2ca41 in poll () from /lib64/libc.so.6
6    Thread 0x7f7d27136700 (LWP 2297229) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7    Thread 0x7f7d25b6c700 (LWP 2297248) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8    Thread 0x7f7d26429700 (LWP 2297242) 0x00007f7daa8296e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9    Thread 0x7f7c4b7fe700 (LWP 2297310) 0x00007f7da9c2ca41 in poll () from /lib64/libc.so.6
10   Thread 0x7f7d26f34700 (LWP 2297231) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11   Thread 0x7f7d2596a700 (LWP 2297250) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12   Thread 0x7f7c601af700 (LWP 2297325) 0x00007f7daa82cfb0 in nanosleep () from /lib64/libpthread.so.0
13   Thread 0x7f7c485f7700 (LWP 2297326) 0x00007f7da9c2f29f in select () from /lib64/libc.so.6
14   Thread 0x7f7d26d32700 (LWP 2297233) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15   Thread 0x7f7d25edd700 (LWP 2297255) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
16   Thread 0x7f7d2652a700 (LWP 2297241) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17   Thread 0x7f7c3fff7700 (LWP 2297380) 0x00007f7daa82c8e4 in read () from /lib64/libpthread.so.0
18   Thread 0x7f7d26c31700 (LWP 2297234) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19   Thread 0x7f7c610b7700 (LWP 2297269) 0x00007f7daa8296e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20   Thread 0x7f7c3e8f5700 (LWP 2297381) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
21   Thread 0x7f7d26b30700 (LWP 2297235) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22   Thread 0x7f7d26a2f700 (LWP 2297236) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23   Thread 0x7f7d2692e700 (LWP 2297237) 0x00007f7daa8292fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0(gdb) info threads


// 8.使用dmrdc工具生成SQL语句
[root@VM-24-17-centos bin]# ./dmrdc sfile=core.12593
dmrdc V8
Analysing: 0/4033536000
Analysing: 31457268/4033536000
Analysing: 62914536/4033536000
-- 中间部分省略
Analysing: 3995073036/4033536000
Analysing: 4026530304/4033536000
ֻ¸򺽳̹²»¨·Ϡ   5.898 s

// 9. 生成的文件为core_tmp.2297221
[root@VM-24-17-centos bin]# ll -lht core.*
-rw------- 1 dmdba dinstall 4033536000 418 20:17 core.12593
-rw-r--r-- 1 root  root            282 418 20:19 core_tmp.12593

// 10. 查看core语句
// 结合dmrdc的结果对应的SQL语句(从dmrdc的结果中找对应的2297381线程号,!#%&*^$@[线程号]):!#%&*^$@[2297381] 与 info threads可以查看崩溃线程号相对应
// 然后开始分析该语句
[root@VM-24-17-centos bin]# cat core_tmp.12593
!#%&*^$@[2297381]:INSERT INTO TESTCOER SELECT LEVEL FROM DUAL CONNECT BY LEVEL <10000500;
!#%&*^$@[2297380]:SELECT EP1.EP_PORT FROM V$DCR_EP EP1, V$DCR_EP EP2, V$INSTANCE INST WHERE EP1.SHM_SIZE > 0 AND EP1.EP_SEQNO=INST.DSC_SEQNO AND EP2.EP_NAME=INST.NAME;

四、给正在运行的DM生成Core

当系统出现异常的时候,dmserver服务又没有宕机,没有自动生成core 时,就需要手动生产core文件
1如果是集群环境,必须先将dmmonitor进程关掉,再将 dmwatcher进程关掉

[dmdba@VM-0-17-centos bin]$ ps -ef|grep dms
dmdba     2426134     1  0 4月13 ?       00:06:28 /opt/dmdbms/bin/dmserver /opt/dmdata/DAMENG/dm.ini -noconsole mount
dmdba    26034 22154  0 21:48 pts/1    00:00:00 grep --color=auto dms
 ## 1. gdb调试进程
[dmdba@VM-0-17-centos bin]$ gdb dmserver 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/dmdbms/bin/dmserver...(no debugging symbols found)...done.
## 2. 通过进程号,attach进入到进程里
(gdb) attach 2426134

## 3. 手动生成core文件
(gdb) generate-core-file
warning: target file /proc/2426134/cmdline contained unexpected null characters
Saved corefile core.2426134

## 4. detach --离开进程
(gdb) detach
Detaching from program: /opt/dmdbms/bin/dmserver, process 2426134
[Inferior 1 (process 2426134) detached]

## 5. quit 退出gdb
(gdb) quit


## 6. 分析完成后,先开启dmwatcher,再将dmmonitor开启

五、给正在运行的DM生成线程堆栈

[dmdba@VM-0-17-centos bin]$ ps -ef|grep dms
dmdba     2426134     1  0 4月13 ?       00:06:28 /opt/dmdbms/bin/dmserver /opt/dmdata/DAMENG/dm.ini -noconsole mount
dmdba    26034 22154  0 21:48 pts/1    00:00:00 grep --color=auto dms
 ## 1. gdb调试进程
[dmdba@VM-0-17-centos bin]$ gdb dmserver 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/dmdbms/bin/dmserver...(no debugging symbols found)...done.
## 2. 通过进程号,attach进入到进程里
(gdb) attach 2426134
## 3. 定义存储堆栈的文件名
(gdb) set logging file core_2426134.txt
(gdb) set logging on
Copying output to core_12593.txt.

## 4. 记录当前所有崩溃线程堆栈 -一直回车到什么都不输出
(gdb)  thread apply all bt
Thread 67 (Thread 0x7f2e3e816700 (LWP 18366)):

## 5. 关闭向文件中写入
(gdb) set logging off
Done logging to core_12593.txt.


## 6. detach --离开进程
(gdb) detach
Detaching from program: /opt/dmdbms/bin/dmserver, process 2426134
[Inferior 1 (process 2426134) detached]

## 7. quit 退出gdb
(gdb) quit



六、打印DM某个线程的堆栈

[root@VM-24-16-centos ~]# ps -ef|grep dmdba
dmdba     6994     1  0 5月10 ?       00:09:47 /opt/dmdbms/bin/dmserver path=/opt/dmdata/DAMENG/dm.ini -noconsole
dmdba     9516     1  0 5月10 ?       00:00:07 /opt/dmdbms/bin/dmap
root     16010 15962  0 22:28 pts/0    00:00:00 grep --color=auto dmdba

2. 通过TOP -H -p 进程ID,找到具体的线程占用情况,Shift+H可以开启关闭线程显示
[root@VM-24-16-centos ~]# top -Hp 6994

3、通过命令pstack 进程ID显示线程堆栈,LWP 6997对应线程ID的堆栈,就是占用CPU最高的堆栈,可以具体分析什么原因造成的。
[root@VM-24-16-centos ~]#  pstack 6997

相关命令

  • bt,查看当前线程的栈信息
  • thread apply all bt,输出所有线程的详细栈信息,通常会由此查看是否有自己实现的类或者so库。一般会把所有线程的详细栈信息输出到一个文件里面如thread_info.txt
  • 查看所有线程堆栈:thread apply all bt
  • 查看指定线程堆栈:thread apply thread1 thread2… bt
  • 切换线程:thread N

本文为达梦在线服务平台【与达梦同行】征文活动投稿文章:详细文章
第一届达梦数据库技术征文大赛来啦

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值