背景描述:数据库运行中,docker服务异常宕机,如何快速恢复oracle数据库服务
场景一:docker服务能够重新拉起,但是容器无法拉起
场景二:docker服务无法拉起,
一、准备工作
禁止 Docker 的自动拉起
修改为no
vi /usr/lib/systemd/system/docker.service
Restart=no
重启生效
[root@e3qiutldor05 system]# systemctl daemon-reload
[root@e3qiutldor05 system]# systemctl restart docker.service
通过存储过程模拟生产场景
1、建表
CREATE TABLE test(
id NUMBER,
name VARCHAR2(255)
);
2、对test表插入三百万次,每插入一次就提交一次
CREATE OR REPLACE PROCEDURE insert_data_multiple_times
IS
BEGIN
FOR i IN 1..3000000 LOOP
INSERT INTO test(id, name) VALUES(i, 'Test' || i);
COMMIT;
END LOOP;
END;
/
执行
BEGIN
insert_data_multiple_times;
END;
/
3、模拟宕机
在数据写入过程中,直接kill docker服务
[root@e3qiutldor05 _data]# ps -ef |grep docker
root 119205 1 0 15:40 ? 00:00:01 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
root 125981 119205 0 15:55 ? 00:00:00 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 1521 -container-ip 172.17.0.2 -container-port 1521
root 125986 119205 0 15:55 ? 00:00:00 /usr/bin/docker-proxy -proto tcp -host-ip :: -host-port 1521 -container-ip 172.17.0.2 -container-port 1521
root 126381 122460 0 15:55 pts/1 00:00:00 grep --color=auto docker
[root@e3qiutldor05 _data]# kill -9 125981
[root@e3qiutldor05 _data]# kill -9 125986
[root@e3qiutldor05 _data]# kill -9 119205
[root@e3qiutldor05 _data]# ps -ef |grep docker
root 127502 122460 0 15:57 pts/1 00:00:00 grep --color=auto docker
docker宕机
[root@e3qiutldor05 volumes]# systemctl stop docker
Warning: Stopping docker.service, but it can still be activated by:
docker.socket
[root@e3qiutldor05 volumes]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2024-02-22 16:03:06 CST; 2s ago
Docs: https://docs.docker.com
Process: 129914 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=0/SUCCESS)
Main PID: 129914 (code=exited, status=0/SUCCESS)
二、开始恢复
场景一:docker服务可以拉起,但是数据库容器无法拉起
恢复方式:通过镜像新建容器,并将原数据文件目录映射到新的容器
原数据库容器状态为exited
[root@e3qiutldor05 helowin]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7fa87488e3fb harbor.supcon.com/database/oracle_11g "/bin/sh -c '/home/o…" About an hour ago Exited (137) 12 minutes ago 0.0.0.0:1521->1521/tcp, :::1521->1521/tcp oracle11g
1、确认宕机数据库的数据文件所在宿主机的路径
docker inspect oracle11g
"Source": "/var/lib/docker/volumes/031b54580261fe56faf694dc862d9614ad63d4d8b820537795aa81a2eb0a2834/_data",
"Destination": "/home/oracle/app/oracle/oradata",
2、利用镜像,新建一台容器
确认新容器数据库的数据文件路径
docker inspect harbor.supcon.com/database/oracle_11g
Volumes": {
"/home/oracle/app/oracle/oradata": {}
利用原有镜像,创建容器,将老容器的路径映射到新容器里的地址
docker run -d -p 1522:1521 --name oracle2 -v /var/lib/docker/volumes/813255bf2505a3ea227dfad957cc965da51b2a48d44f11896b5bae88c9613f4c/_data:/home/oracle/app/oracle/oradata harbor.supcon.com/database/oracle_11g
3、拉起数据库
新容器数据库启动失败,两个控制文件的version不一致
SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00214: control file '/home/oracle/app/oracle/oradata/helowin/control01.ctl'
version 896 inconsistent with file
'/home/oracle/app/oracle/flash_recovery_area/helowin/control02.ctl' version 841
修改参数,只指定一个控制文件
SQL> alter system set control_files='/home/oracle/app/oracle/oradata/helowin/control01.ctl' scope=spfile;
数据库重启成功,完成数据验证,未提交的数据将会丢失
SQL> startup
ORACLE instance started.
Total System Global Area 1603411968 bytes
Fixed Size 2213776 bytes
Variable Size 402655344 bytes
Database Buffers 1191182336 bytes
Redo Buffers 7360512 bytes
Database mounted.
Database opened.
SQL> select count(*) from test;
COUNT(*)
----------
951317
4、数据比对
新容器
SQL> select count(*) from test;
COUNT(*)
----------
951317
原生产容器
SQL> select count(*) from test;
COUNT(*)
----------
951317
数据一致,但是由于是异常宕机,未提交的数据新老容器都是会丢失的
三、开始恢复
场景二:docker服务无法启动
恢复方式:异机恢复
进入到宿主机目录下,将数据文件压缩拷贝,并传输到恢复机器
tar -cf LHR11G.tar LHR11G/
创建新的参数文件
vi initlhr11g.ora
*.audit_file_dest='/test/adump'
*.audit_trail='db'
*.compatible='11.2.0.4.0'
*.control_files='/test/LHR11G/control01.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='lhr11g'
*.diagnostic_dest='/test/adump'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=orclXDB)'
*.open_cursors=300
*.pga_aggregate_target=183500800
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=550502400
*.undo_tablespace='UNDOTBS1'
启动数据库到nomount状态
[oracle@qiuyang1 test]$ export ORACLE_SID=lhr11g
[oracle@qiuyang1 test]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Mon Feb 26 10:28:12 2024
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount pfile='/oracle/app/product/11.2.0.4/db_1/dbs/initlhr11g.ora';
ORACLE instance started.
Total System Global Area 551165952 bytes
Fixed Size 2255112 bytes
Variable Size 218105592 bytes
Database Buffers 322961408 bytes
Redo Buffers 7843840 bytes
SQL> alter database mount;
通过拼接语句生成修改控制文件中数据文件路径的sql
SQL> SELECT 'ALTER DATABASE RENAME FILE ''' || name || ''' TO ''/test/LHR11G/' || SUBSTR(name, INSTR(name, '/', -1) + 1) || '''' || ';' AS command FROM v$datafile;
COMMAND
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/system01.dbf' TO '/test/LHR11G/system01.dbf';
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/sysaux01.dbf' TO '/test/LHR11G/sysaux01.dbf';
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/undotbs01.dbf' TO '/test/LHR11G/undotbs01.dbf';
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/users01.dbf' TO '/test/LHR11G/users01.dbf';
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/example01.dbf' TO '/test/LHR11G/example01.dbf';
通过拼接语句生成修改控制文件中redolog路径的sql
SQL> SELECT 'ALTER DATABASE RENAME FILE ''' || member || ''' TO ''/test/LHR11G/' || SUBSTR(member, INSTR(member, '/', -1) + 1) || '''' || ';' AS command FROM v$logfile;
COMMAND
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/redo03.log' TO '/test/LHR11G/redo03.log';
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/redo02.log' TO '/test/LHR11G/redo02.log';
ALTER DATABASE RENAME FILE '/u01/app/oracle/oradata/LHR11G/redo01.log' TO '/test/LHR11G/redo01.log';
打开数据库
SQL> alter database open;
Database altered.
比对数据
老库
SYS@LHR11G> select count(*) from test;
COUNT(*)
----------
1139633
异地恢复库:
SQL> select count(*) from test;
COUNT(*)
----------
1139633