Postgresql 8.3.3
今天应用反映数据库很慢,有些SQL执行一天都没执行完,
检查数据库,发现大量会话在更新同一张表 tbl_tmp, 产生大量行锁申请等侍。
--等于等侍状态的SQL (被堵住的SQL)
anpstat=# select datname,procpid,query_start,current_query,waiting,client_addr from pg_stat_activity where waiting='t';
datname | procpid | query_start | current_query | waiting | client_addr
----------+---------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------+---------+-------------
sanpstat | 14044 | 2010-10-07 16:46:40.386904+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 27839 | 2010-10-07 16:49:18.022499+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 3539 | 2010-10-07 20:28:13.212075+08 | update tbl_tmp set feedback =$1 where seq=$2 | t | 127.0.0.1
sanpstat | 3894 | 2010-10-07 20:53:43.466517+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 10130 | 2010-10-08 09:37:51.253871+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 9083 | 2010-10-08 08:08:59.221976+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 10038 | 2010-10-08 09:25:55.434459+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 10241 | 2010-10-08 09:51:13.990492+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1 | t | 127.0.0.1
sanpstat | 11147 | 2010-10-08 11:11:21.574665+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 11168 | 2010-10-08 11:17:17.158246+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 11926 | 2010-10-08 11:54:10.704641+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
sanpstat | 11959 | 2010-10-08 11:56:07.021072+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | t | 127.0.0.1
--正在执行的 update 语句(堵住了其它SQL)
sanpstat=# select datname,procpid,query_start,current_query,waiting,client_addr from pg_stat_activity where waiting='f'
sanpstat-# and current_query like '%update tbl_tmp %';
datname | procpid | query_start | current_query | waiting | client_addr
----------+---------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------+---------+---------------
sanpstat | 2012 | 2010-10-07 16:52:55.8228+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 14157 | 2010-10-07 16:41:37.308062+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 10177 | 2010-10-08 09:43:43.297872+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 2043 | 2010-10-07 19:25:46.931806+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 3298 | 2010-10-07 19:37:53.307125+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 3322 | 2010-10-07 19:40:39.373079+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 4114 | 2010-10-08 09:25:18.701269+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 9082 | 2010-10-08 09:18:16.153882+08 | update tbl_tmp set show=1 , gmt_modified=now(), modifier=$1 where seq=$2 | f | 127.0.0.1
sanpstat | 10042 | 2010-10-08 09:32:55.260732+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 10278 | 2010-10-08 11:08:55.334562+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 11810 | 2010-10-08 11:46:24.147652+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 | f | 127.0.0.1
sanpstat | 11964 | 2010-10-08 11:58:32.831916+08 | update tbl_tmp set responsed=true , feedback =$1, gmt_modified=now(), modifier=$2 where seq=$3 |
--用pg_cancel_backend()杀进程
sanpstat=# select pg_cancel_backend(2012);
pg_cancel_backend
-------------------
t
(1 row)
用 pg_cancel_backend()杀进程,虽然显示为 't' ,但进程还在,文档上解释 Cancel a backend's current query,
猜想pg_cancel_backend()只能 kill select 语句,而不能kill update语句,后来实验也证实了这一点,有兴趣的朋友
可以自己做实验去验证下。
--kill -9 杀进程
kill -9 2012
kill -9 14157
kill -9 10177
kill -9 2043
kill -9 3298
kill -9 3322
kill -9 4114
kill -9 9082
kill -9 10042
kill -9 10278
kill -9 11810
kill -9 11964
这里将正在执行的update tbl_tmp 表的会话kill 掉.
--数据库无法连接
[enterprisedb@sanp-rich-db1 ~]$ psql -h 192.168.3.27 -p 1921 -d postgres -U rmt_db_bak
psql: FATAL: the database system is in recovery mode
进程kill -9 以后,发现数据库无法连接,遭了,还好是个测试库。
--数据库处于恢复模式
-bash-3.2$ ps -ef | grep post
postgres 4627 1 0 Apr12 ? 00:02:24 /home/postgres/bin/postgres -D /mnt/data
postgres 4629 4627 0 Apr12 ? 00:00:04 postgres: logger process
root 14195 14168 0 15:37 pts/7 00:00:00 su - postgres
postgres 14196 14195 0 15:37 pts/7 00:00:00 -bash
postgres 14332 4627 44 15:41 ? 00:00:18 postgres: startup process recovering 00000001000000DA0000006E
postgres 21495 14196 0 15:41 pts/7 00:00:00 ps -ef
postgres 21496 14196 0 15:41 pts/7 00:00:00 grep post
发现Postgresql 处于 recovery 状态,奇怪,Postgresql 恢复过程中是不允许连接的,在这点上,Oracle似乎要友好些。
--然后重启数据库后,恢复正常。
结论: 1 postgresql 8.3.3的版本 只提供系统函数 pg_cancel_backend(pid int) 杀进程,但是,这个函数只能 kill Select
查询,而updae,delete DML不生效,感兴趣的朋友可以自己做实验验证下;
2 postgresql 8.4 以后版本提供 函数 pg_terminate_backend(pid int), 这个函数功能比前者强大,可以kill
各种DML(SELECT,UPDATE,DELETE,DROP)操作;
3 对于Postgresql ,最好不要用kill -9 去杀用户进程,因为如果是一个很大的UPDATE,kill后,Postgresql
需要很长的时间做Recovery,而在这个RECOVERY过程中,数据库是不可用的,在这点上,Oracle 要友好些,
在数据库恢复的过程中数据库依然可用,不太需要因Kill -9 用户进程而造成数据库不可用的情况。