在XTTS迁移测试阶段,遇到执行几个expdp的导出任务,迟迟没有返回任何信息,对应日志无任何输出。
环境:
AIX 6.1 + Oracle 10.2.0.4
现象:
在XTTS迁移测试阶段,遇到执行几个expdp的导出任务,迟迟没有返回任何信息,对应日志无任何输出,查看任务状态:
SQL>
set lines 300
col OWNER_NAME for a10
col OPERATION for a15
col JOB_MODE for a20
col STATE for a15
select * from dba_datapump_jobs;
OWNER_NAME JOB_NAME OPERATION JOB_MODE STATE DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
---------- ------------------------------ --------------- -------------------- --------------- ---------- ----------------- -----------------
SYS SYS_EXPORT_TRANSPORTABLE_01 EXPORT TRANSPORTABLE DEFINING 1 0 1
SYS SYS_EXPORT_TRANSPORTABLE_02 EXPORT TRANSPORTABLE DEFINING 1 1 2
SYS SYS_EXPORT_TRANSPORTABLE_03 EXPORT TRANSPORTABLE DEFINING 1 1 2
SYS SYS_EXPORT_SCHEMA_01 EXPORT SCHEMA DEFINING 1 1 2
SYS SYS_EXPORT_TRANSPORTABLE_04 EXPORT TRANSPORTABLE DEFINING 1 1 2
SYS SYS_EXPORT_SCHEMA_02 EXPORT SCHEMA DEFINING 1 1 2
6 rows selected.
可以看到所有的expdp导出任务的STATE都停留在DEFINING状态。
- 1.牛刀小试清异常
- 2.追本溯源查MOS
1.牛刀小试清异常
先强制杀掉后台执行的所有expdp任务:
ps -ef|grep expdp|grep -v grep|awk '{print $2}'|xargs kill -9
然后尝试删除这些表(其实应该在not running状态下删除)
select 'drop table '||OWNER_NAME||'.'||JOB_NAME||' purge;' from dba_datapump_jobs where STATE='NOT RUNNING';
drop table sys.SYS_EXPORT_TRANSPORTABLE_01 purge;
..
可这样是没有作用的,查询结果不变。
甚至尝试正常shutdown immediate停止数据库,也无法成功,告警日志看到有活动调用:
Thu Nov 1 15:14:24 2018
Active call for process 4522064 user 'oracle' program 'oracle@localhost (DM00)'
Active call for process 4456536 user 'oracle' program 'oracle@localhost (DM01)'
Active call for process 10027180 user 'oracle' program 'oracle@localhost (DM02)'
Active call for process 7340140 user 'oracle' program 'oracle@localhost (DM03)'
Active call for process 6291888 user 'oracle' program 'oracle@localhost (DM04)'
Active call for process 8126596 user 'oracle' program 'oracle@localhost (DM05)'
SHUTDOWN: waiting for active calls to complete.
发现这些进程的id都对应了ora_dm的进程:
$ ps -ef|grep ora_dm
oracle 4456536 1 0 17:00:09 - 0:00 ora_dm01_xxxxdb
oracle 4522064 1 0 16:50:57 - 0:00 ora_dm00_xxxxdb
oracle 7340140 1 0 14:06:07 - 0:00 ora_dm03_xxxxdb
oracle 8126596 1 0 14:35:03 - 0:00 ora_dm05_xxxxdb
oracle 10027180 1 0 13:55:08 - 0:00 ora_dm02_xxxxdb
oracle 6291888 1 0 14:31:17 - 0:00 ora_dm04_xxxxdb
oracle 7340432 8388786 0 15:22:59 pts/4 0:00 grep ora_dm
实际上,这就是expdp任务的相关进程,强制杀掉这些进程:
ps -ef|grep ora_dm|grep -v grep|awk '{print $2}'|xargs kill -9
之后数据库关闭成功:
Thu Nov 1 15:24:37 2018
All dispatchers and shared servers shutdown
Thu Nov 1 15:24:37 2018
ALTER DATABASE CLOSE NORMAL
启动数据库后,再次查询发现已经成功清理:
SQL>
set lines 300
col OWNER_NAME for a10
col OPERATION for a15
col JOB_MODE for a20
col STATE for a15
select * from dba_datapump_jobs;
no rows selected
小结:
数据泵任务与ora_dm进程相关;如果数据泵任务发生异常,但任务并没有退出的情况,需要同时杀掉这类进程(杀掉后状态就会变为NOT RUNNING)。关库不是必须的,只是演示此时正常关闭被阻塞的场景。这也能说明为什么要保证在NOT RUNNING状态下才可以清理。
2.追本溯源查MOS
上面的步骤只是清理了异常的数据泵任务,但没有解决问题,再次后台执行备份任务依然会重现故障: nohup sh expdp_xtts.sh &
$ ps -ef|grep expdp
oracle 6684914 8061208 0 15:30:07 pts/2 0:00 grep expdp
oracle 7143482 8061208 0 15:30:03 pts/2 0:00 sh expdp_xtts.sh
oracle 6685096 7143482 0 15:30:03 pts/2 0:00 expdp '/ as sysdba' parfile=expdp_xtts.par
$ ps -ef|grep ora_dm
oracle 7602308 8061208 0 15:30:10 pts/2 0:00 grep ora_dm
oracle 3997964 1 1 15:30:05 - 0:00 ora_dm00_xxxxdb
$
此时查询dba_datapump_jobs,state依然一直是defining状态:
OWNER_NAME JOB_NAME OPERATION JOB_MODE STATE DEGREE ATTACHED_SESSIONS DATAPUMP_SESSIONS
---------- ------------------------------ --------------- ------------------------------ ------------------------------ ---------- ----------------- -----------------
SYS SYS_EXPORT_TRANSPORTABLE_01 EXPORT TRANSPORTABLE DEFINING 1 1 2
其他的导出任务都一样,不再赘述。
为了方便测试,写一个简单的单表expdp导出,现象也一样。
expdp '/ as sysdba' directory=XTTS tables=query.test dumpfile=query_test.dmp logfile=query_test.log
根据故障现象,用如下关键字在MOS中搜索: expdp state DEFINING,匹配到文档:
...
✨ 接下来内容请点击【原文】进行查看~
更多数据库相关内容,可访问【墨天轮】进行浏览。