Oracle案例06—— OGG-01098 Could not flush "./dirdat/e1000004383" (error 28, No space left on device)...

 一、前言

  自从换了新环境,各种问题层出不穷,如果不是之前积累的经验丰富,估计都歇菜了,看来作为数据库全栈工程师(oracle/mysql/sqlserver/sap hana/pg/mongodb/redis)还是有好处的(新环境需要完善的地方很多很多啊...),O(∩_∩)O哈哈~。今天同事找我说有个报表库4.5号数据就没有了,问我是不是ogg数据同步有问题。我一脸懵逼,首先4.5号出现的问题,现在都快1个月了才发现说明监控机制不完善,其次业务部门这反映也太滞后了也,出问题就先解决问题呗。

二、问题排查

登录Ogg源库查看相关进程:

[oracle@dg dirprm]$ ../ggsci

GGSCI (dg) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPRPT01     00:00:00      666:57:30   
EXTRACT     ABENDED     EXTRPT01    00:00:00      666:57:38   
GGSCI (dg) 2> info EXTRPT01

EXTRACT    EXTRPT01  Last Started 2017-10-27 18:12   Status ABENDED
Checkpoint Lag       00:00:00 (updated 666:58:56 ago)
Log Read Checkpoint  Oracle Redo Logs
                     2018-04-05 23:00:24  Seqno 282927, RBA 832138240
                     SCN 12.2143954677 (53683562229)

通过上面检查,发现源端的数据抽取进程已经挂起27天左右了,也就是发生在4.5号的23:00,那么具体是什么原因导致的这个问题呢?需要通过检查ogg错误日志

[oracle@dg ogg]$ cd dirrpt/

[oracle@dg dirrpt]$ vi DPRPT010.rpt

2018-04-05 23:00:36  ERROR   OGG-01098  Could not flush "./dirdat/e1000004383" (error 28, No space left on device).
Failed to save data to 'dirdmp/gglog-EXTRPT01.dmp', error 28 - No space left on device

发现具体导致进程挂起的原因是因为磁盘空间不足,导致数据抽取无法写入trail文件,检查磁盘空间发现磁盘目前是充足的,那么尝试重启ext进程

GGSCI (dg) 1>start EXTRPT01
Sending START request to MANAGER ...
EXTRACT EXTRPT01 starting

GGSCI (dg) 3> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPRPT01     00:00:00      667:04:13   
EXTRACT     RUNNING     EXTRPT01    00:00:00      667:04:21   

隔1分钟后

GGSCI (dg) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPRPT01     00:00:00      667:05:10   
EXTRACT     ABENDED     EXTRPT01    00:00:00      667:05:17   

启动失败,再次查看错误日志DPRPT01.rpt,错误信息如下:

2018-05-03 11:36:52  ERROR   OGG-00446  Could not find archived log for sequence 282927 thread 1 under default destinations SQL <SELECT  name    FROM v$archived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = 'YES' AND         deleted = 'NO' >, error retrieving redo file name for sequence 282927, archived = 1, use_alternate = 0Not able to establish initial position for sequence 282927, rba 790067728.

2018-05-03 11:36:52  ERROR   OGG-01668  PROCESS ABENDING.

这里可以看到抽取进程读取源库的归档日志的时候无法找到相应归档日志(估计已被清理)

col name for a55;
set line 200;
set pagesize 20000;

select sequence#,name,COMPLETION_TIME,STATUS from v$archived_log  where sequence#>=282926  and rownum<=30;

经过确认,发现确实自282927以后到5.2日前的归档日志已经被删除,至此可以确认问题导致的原因和现状情况。

错误原因是:磁盘空间不足导致抽取进程挂起,然后隔了好久没有恢复OGG数据同步,数据源的归档日志又被清理掉,所以无法通过启动抽取进程完成恢复OGG数据同步。

三、解决方案

1、可以通过从备份恢复归档日志,完成ogg数据同步直接。(鉴于每天归档日志大约80G左右,一个月的归档数据恢复较大量以及数据同步依旧需要较大数据量,所以不采用此方式)

2、通过重新部署OGG主从同步进程,完成OGG数据同步,经过检查发现需要同步的表有11张,数据量最大的也就6千万左右数据,同步速度比较快。

select count(1) from   testuser.t_t1; --      1163
select count(1) from   testuser.t_t2;  --   3794574
select count(1) from   testuser.t_t3; --14461070
select count(1) from   testuser.t_t4; --    135962
select count(1) from   testuser.t_t5; --3331344
select count(1) from   testuser.t_t6;  --   5961455
select count(1) from   testuser.t_t7;  --131280
select count(1) from   testuser.t_t8; --7459898
select count(1) from   testuser.t_t9  --8698
select count(1) from   testuser.t_t10;  --62504749
select count(1) from   testuser.t_t11; --11581710

3、完成数据同步恢复后,需要完善数据库状态监控,包含不仅限于DG主备装填、OGG进程状态、实例状态等

4、重新部署过程略。

1、停止源库和目标库ogg进程
1)源库

GGSCI (source_userdg) 7> stop extrpt01
EXTRACT EXTRPT01 is already stopped.


GGSCI (source_userdg) 9> stop DPRPT01 

Sending STOP request to EXTRACT DPRPT01 ...
STOP request pending end-of-transaction (4 records so far)..

GGSCI (source_userdg) 24> kill extract DPRPT01 

Sending KILL request to MANAGER ...
Killed process (19665) for EXTRACT DPRPT01

2)目标库

GGSCI (source_user_report) 2> stop REPRPT01
REPLICAT REPRPT01 is already stopped.

2、删除原配置文件
1)源库
GGSCI (source_userdg) 26> delete EXTRPT01

2018-05-07 10:16:39  WARNING OGG-01753  Cannot unregister EXTRACT EXTRPT01 from database because no database login was provided. You can manually unregister this group later with the UNREGISTER EXTRACT command with LOGRETENTION. Issue DBLOGIN first.
Deleted EXTRACT EXTRPT01.


GGSCI (source_userdg) 27> delete DPRPT01
Deleted EXTRACT DPRPT01.

2)目标库
GGSCI (source_user_report) 5> delete REPRPT01
ERROR: Could not delete DB checkpoint for REPLICAT REPRPT01 (Database login required to delete database checkpoint)

GGSCI (source_user_report) 6> dblogin useridalias gguser 
Successfully logged into database.

GGSCI (source_user_report as gguser@source_dbname) 7> delete REPRPT01
Deleted REPLICAT REPRPT01.

3、清理日志和历史文件
1)源库
[oracle@source_userdg ogg]$ rm -rf dirdat/*
[oracle@source_userdg ogg]$ rm -rf dirrpt/EXT*
[oracle@source_userdg ogg]$ rm -rf dirrpt/PUMP*
[oracle@source_userdg ogg]$ rm -rf dirrpt/ext01*
[oracle@source_userdg ogg]$ rm -rf dirrpt/DPR*  
[oracle@source_userdg ogg]$ rm -rf dirrpt/extrpt01.dsc 
[oracle@source_userdg ogg]$ rm -rf dirrpt/MGR{0,1,2}*
2)目标库
[oracle@source_user_report ogg]$ rm -rf dirdat/r1000004383
[oracle@source_user_report ogg]$ rm -rf  dirrpt/REPRPT01*
[oracle@source_user_report ogg]$ rm -rf dirrpt/rep*
[oracle@source_user_report ogg]$ rm -rf dirrpt/MGR0.rpt 
3)目标库表数据清理

alter session set current_schema=gguser;

truncate table xxx; 

4、重新配置

1)源库
extract进程配置
GGSCI (source_userdg as gguser@source_user) 4> add extract extrpt01,tranlog,begin now
EXTRACT added.

GGSCI (source_userdg as gguser@source_user) 5> add exttrail ./dirdat/ex,extract extrpt01 MEGABYTES 100
EXTTRAIL added.

pump进程配置
GGSCI (source_userdg as gguser@source_user) 6> add extract DPRPT01,exttrailsource ./dirdat/ex
EXTRACT added.

GGSCI (source_userdg as gguser@source_user) 7> add rmttrail  ./dirdat/rt,extract DPRPT01
RMTTRAIL added.

2)目标库

GGSCI (source_user_report) 1> dblogin useridalias gguser
Successfully logged into database.
GGSCI (source_user_report as gguser@source_dbname) 3> add checkpointtable gguser.checkpointtab

Successfully created checkpoint table gguser.checkpointtab.

GGSCI (source_user_report as gguser@source_dbname) 4> add replicat REPRPT01,exttrail   ./dirdat/rt,checkpointtable gguser.checkpointtab
REPLICAT added.

3)启动源库进程
GGSCI (source_userdg) 6> start EXTRPT01 

Sending START request to MANAGER ...
EXTRACT EXTRPT01 starting


GGSCI (source_userdg) 7> start DPRPT01

Sending START request to MANAGER ...
EXTRACT DPRPT01 starting

GGSCI (source_userdg) 8> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     DPRPT01     00:00:00      00:00:47    
EXTRACT     RUNNING     EXTRPT01    00:00:53      00:00:03   

4)源库数据导出

select * from dba_directories; 
DATA_PUMP_DIR
/U01/oracle/admin/source_user/dpdump/

col  CURRENT_SCN  format 999999999999999999999999;
set line 200;
set pagesize 20000;

select current_scn from v$database;
55011413592
##select dbms_flashback.get_system_change_number current_scn from dual;
##select timestamp_to_scn(to_timestamp('2018-05-06 08:05:14')) from dual;
##select to_char(scn_to_timestamp(55011413592),'yyyy-mm-dd hh24:mi:ss') from dual;

expdp \'/ as sysdba\' directory=DATA_PUMP_DIR file=20180507.dmp tables=source_user.t1,source_user.t2,source_user.t3 grants=n statistics=none triggers=n compress=n content=data_only FLASHBACK_SCN=55011413592 log=expdp.log

5)目标库数据导入
select * from dba_directories; 
DATA_PUMP_DIR    
/U01/oracle/admin/source_dbname/dpdump/
impdp \'/ as sysdba\' DIRECTORY=DATA_PUMP_DIR  DUMPFILE=20180507.dmp    REMAP_SCHEMA=source_user:gguser content=data_only LOGFILE=impdp.log               

6)启动目标库进程
start REPRPT01,aftercsn 55011413592

GGSCI (source_user_report) 17> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     REPRPT01    02:43:08      00:00:04 

5、数据校验

 

转载于:https://www.cnblogs.com/rangle/p/8985543.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值