下午时接到个反馈,说是数据库挂了。然后立即远程上去看,首先就看windows事件日志,发现里面的postgresql报错全是关于postmaster.pid已存在的。故得知原因是非正常停止数据库导致postmaster.pid存在,相当于lock住了,再启动服务就不成功。故选择删除postmaster.pid,重启服务器成功。
此次原因很简单,大概5分钟左右就搞定。
接近5点又接到个反馈,说另一个地方的数据库挂了,而且预先就得知postmaster.pid删除再重启无效,心凉了一半。然后远程上去看,windows事件日志是赤裸裸的“timeout server start”,这么严重,服务都起不来了,给个超时的原因,真是晕。然后看一下pg本身的log,就是"database interrupted when recovery",感觉就是在恢复时被中断了啥的,也瞧不出什么原因。
幸亏今天翻了反馈记录,知道以前有同事是用pg_resetxlog处理的。马上搬救兵,具体过程如下:
1.在bin目录下有个pg_controldata.exe,在cmd中输入命令:
pg_controldata.exe "C:\PostgreSQL\8.3\data"
得到lasted XID和lasted OID
pg_control version number: 833
Catalog version number: 200711281
Database system identifier: 6092902914614812411
Database cluster state: in production
pg_control last modified: 2015/1/14 16:10:55
Latest checkpoint location: 0/5AC46260
Prior checkpoint location: 0/5AC46218
Latest checkpoint's REDO location: 0/5AC46260
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/2163
Latest checkpoint's NextOID: 45386
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: 2015/1/14 16:10:55
Minimum recovery ending location: 0/0
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: Chinese (Simplified)_People's Republic of China.936
LC_CTYPE: Chinese (Simplified)_People's Republic of China.936
2.然后再使用pg_resetxlog.exe,命令如下:
pg_resetxlog.exe -o 45386 -x 2163 -f "C:\PostgreSQL\8.3\data"
在运行这个时,pid文件应该删掉,并且postgres所有相关进程都kill掉。
3.运行完上述东西后,再重启数据库服务应该就好了。