实时数仓实践(二)OGG同步到kafka

引用:https://blog.csdn.net/qq_31312071/article/details/116056488

ogg即Oracle GoldenGate是Oracle的同步工具,本文讲如何配置ogg以实现Oracle数据库增量数据实时同步到kafka中,其中同步消息格式为json。

在这里插入图片描述

身份hostname组件和版本IPOGG版本
源端db-oracleoracle11g192.168.10.101row 1 col 2
目标端linux121kafka_2.12-1.0.2192.168.10.121row 1 col 2

1.下载地址

源端-目标端都可以在以下地址下载,关键词搜索即可
https://edelivery.oracle.com/osdc/faces/SoftwareDelivery

目标端最新版下载

https://www.oracle.com/middleware/technologies/goldengate-downloads.html

源端文件夹名:fbo_ggs_Linux_x64_shiphome

目标端文件夹名:OGG_BigData_Linux_x64_12.3.2.1.1.tar

2.源端添加ogg

## 源端目标端创建文件夹
mkdir /opt/ogg

## 源端
静默安装ogg
[oracle@db-oracle ogg]$ cd /opt/ogg/fbo_ggs_Linux_x64_shiphome/Disk1/response/
[oracle@db-oracle response]$ ll
total 8
-rw-r--r--. 1 oracle oinstall 4340 Jul  9 16:14 oggcore.rsp
[oracle@db-oracle response]$ pwd
/opt/ogg/fbo_ggs_Linux_x64_shiphome/Disk1/response
[oracle@db-oracle response]$ vim oggcore.rsp
###主要修改内容:
INSTALL_OPTION=ORA12c --安装选项,如果是oracle11g选择ORA11g
SOFTWARE_LOCATION=/home/oracle/ogg/ggs12_2 --OGG的安装目录,一定要是一个空目录
START_MANAGER=false --是否自动启动mgr管理进程

##开始静默安装
[oracle@db-oracle Disk1]$ ./runInstaller  -silent -responseFile /opt/ogg/fbo_ggs_Linux_x64_shiphome/Disk1/response/oggcore.rsp
添加环境变量

为了简单方便起见,我在/etc/profile里配置的,建议在生产中配置oracle的环境变量文件/home/oracle/.bash_profile里配置,为了怕出问题,我把OGG_HOME等环境变量在/etc/profile配置了一份,不知道这是否是必须的。

vi ~/.bash_profile

#oracle
ORACLE_BASE=/data/oracle
ORACLE_HOME=$ORACLE_BASE/product/11.2.0
ORACLE_SID=orcl
PATH=$PATH:$ORACLE_HOME/bin
export ORACLE_BASE ORACLE_HOME ORACLE_SID PATH

#ogg
export OGG_HOME=/opt/ogg
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib
export PATH=$OGG_HOME:$PATH

source /etc/profile

###测试一下命令
ggsci
oracle打开归档模式
su - oracle
sqlplus / as sysdba

### 执行下面的命令查看当前是否为归档模式
SQL> archive log list 
Database log mode          No Archive Mode
Automatic archival         Disabled
Archive destination        USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     12
Current log sequence           14

如果Automatic archival Disabled,执行下面打开

conn / as sysdba (以DBA身份连接数据库) 
shutdown immediate (立即关闭数据库)
startup mount (启动实例并加载数据库,但不打开)
alter database archivelog; (更改数据库为归档模式)
alter database open; (打开数据库)
alter system archive log start; (启用自动归档)

在执行一下

SQL> archive log list 

Database log mode          Archive Mode
Automatic archival         Enabled
Archive destination        USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     12
Next log sequence to archive   14
Current log sequence           14
Oracle打开日志相关

OGG基于辅助日志等进行实时传输,故需要打开相关日志确保可获取事务内容,通过下面的命令查看该状态

select force_logging, supplemental_log_data_min from v$database;

FORCE_ SUPPLEMENTAL_LOG
------ ----------------
NO     NO

## 如果为no,执行以下
alter database force logging;
alter database add supplemental log data;
#再查一遍
SQL> select force_logging, supplemental_log_data_min from v$database;

FORCE_ SUPPLEMENTAL_LOG
------ ----------------
YES    YES
oracle创建复制用户

首先root用户建立相关文件夹,并赋予权限

mkdir -p /data/oracle/tablespace
chown -R oracle:oinstall /data/oracle/tablespace

然后执行下面sql


SQL> create tablespace oggtbs datafile '/data/oracle/tablespace/oggtbs01.dbf' size 2000M autoextend on;
Tablespace created.
SQL> create user ogg identified by 123456 default tablespace oggtbs;
SQL> grant dba to ogg;

3.OGG初始化

这里特别注意源端和目标端的ogg目录要保持一致

ggsci
GGSCI (db-oracle) 1> create subdirs
 
Creating subdirectories under current directory /root
 
Parameter files                /opt/ogg/dirprm: created
Report files                   /opt/ogg/dirrpt: created
Checkpoint files               /opt/ogg/dirchk: created
Process status files           /opt/ogg/dirpcs: created
SQL script files               /opt/ogg/dirsql: created
Database definitions files     /opt/ogg/dirdef: created
Extract data files             /opt/ogg/dirdat: created
Temporary files                /opt/ogg/dirtmp: created
Stdout files                   /opt/ogg/dirout: created
 
GGSCI (db-oracle) 2> 
Oracle创建测试表
//test_ogg和上面的ogg两个用户
create user test_ogg  identified by test_ogg default tablespace users;
grant dba to test_ogg;
conn test_ogg/test_ogg;
create table test_ogg(id int ,name varchar(20),primary key(id));

4.OGG配置

目标端(kafka)配置
mkdir -p /opt/ogg
unzip oggbigdata12.3.zip 
tar xvf OGG_BigData_Linux_x64_12.3.2.1.1.tar  -C /opt/ogg/

配置环境变量

vim /etc/profile
export OGG_HOME=/opt/ogg
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64/libjsig.so:$JAVA_HOME/jre/lib/amd64/server/libjvm.so:$OGG_HOME/lib
export PATH=$OGG_HOME:$PATH
source /etc/profile

同样测试一下ogg命令(不生效重新打开窗口执行)

ggsci
## 初始化目录
GGSCI (db-oracle) 1> create subdirs
OGG源端配置

配置OGG的全局变量

su oracle
cd /opt/ogg
ggsci
GGSCI (db-oracle) 1> dblogin userid ogg password 123456
Successfully logged into database.
 
GGSCI (db-oracle as ogg@orcl) 2> edit param ./globals
## 然后和用vim编辑一样添加
oggschema ogg

配置管理器mgr

GGSCI (db-oracle as ogg@orcl) 3> edit param mgr
PORT 7809
DYNAMICPORTLIST 7810-7909
AUTORESTART EXTRACT *,RETRIES 5,WAITMINUTES 3
PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 3

说明:

  1. PORT即mgr的默认监听端口;
  2. DYNAMICPORTLIST动态端口列表,当指定的mgr端口不可用时,会在这个端口列表中选择一个,最大指定范围为256个;
  3. AUTORESTART重启参数设置表示重启所有EXTRACT进程,最多5次,每次间隔3分钟;
  4. PURGEOLDEXTRACTS即TRAIL文件的定期清理

添加复制表

GGSCI (db-oracle) 2> dblogin userid ogg password 123456
Successfully logged into database.

GGSCI (db-oracle as ogg@orcl) 3> add trandata ogg.test_ogg

2021-07-09 23:50:02  INFO    OGG-15132  Logging of supplemental redo data enabled for table OGG.TEST_OGG.

2021-07-09 23:50:02  INFO    OGG-15133  TRANDATA for scheduling columns has been added on table OGG.TEST_OGG.

2021-07-09 23:50:02  INFO    OGG-15135  TRANDATA for instantiation CSN has been added on table OGG.TEST_OGG.

2021-07-09 23:50:02  WARNING OGG-01988  Could not find schematrandata function in source database: failed to find function.

GGSCI (db-oracle as ogg@orcl) 4> info trandata ogg.test_ogg

2021-07-09 23:51:23  WARNING OGG-01988  Could not find schematrandata function in source database: failed to find function.

Logging of supplemental redo log data is enabled for table OGG.TEST_OGG.

Columns supplementally logged for table OGG.TEST_OGG: "ID".

Prepared CSN for table OGG.TEST_OGG: 1213506

配置extract进程

GGSCI (db-oracle as ogg@orcl) 6> edit param extkafka

配置1(有的orcl使用SID方式会连不上数据库,报错的话可采用配置2)

extract extkafka
dynamicresolution
SETENV (ORACLE_SID = "orcl")
SETENV (NLS_LANG = "american_america.AL32UTF8")
userid ogg,password 123456
exttrail /opt/ogg/dirdat/to
table test_ogg.test_ogg;
extract extkafka
dynamicresolution
SETENV (ORACLE_SID = "orcl")
SETENV (NLS_LANG = "american_america.AL32UTF8")
userid ogg,password 123456
exttrail /opt/ogg/dirdat/to
table test_ogg.test_ogg;
table test_ogg.ydjt_*;

配置2

extract extkafka
dynamicresolution
SETENV (NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")
userid ogg@orcl,password 123456
exttrail /opt/ogg/dirdat/to
table ogg.test_ogg;

说明:第一行指定extract进程名称;dynamicresolution动态解析;SETENV设置环境变量,这里分别设置了Oracle数据库以及字符集;userid ggs,password ggs即OGG连接Oracle数据库的帐号密码,这里使用2.5中特意创建的复制帐号;exttrail定义trail文件的保存位置以及文件名,注意这里文件名只能是2个字母,其余部分OGG会补齐;table即复制表的表名,支持*通配,必须以;结尾

添加extract进程:

GGSCI (db-oracle as ogg@orcl) 6> add extract extkafka,tranlog,begin now
EXTRACT added.

添加trail文件的定义与extract进程绑定:

GGSCI (db-oracle as ogg@orcl) 7> add exttrail /opt/ogg/dirdat/to,extract extkafka
EXTTRAIL added.

配置pump进程

pump进程本质上来说也是一个extract,只不过他的作用仅仅是把trail文件传递到目标端,配置过程和extract进程类似,只是逻辑上称之为pump进程

edit param pukafka
## vim添加
extract pukafka
passthru
dynamicresolution
userid ogg,password 123456
rmthost 192.168.10.121 mgrport 7809 //目标端ip
rmttrail /opt/ogg/dirdat/to
table test_ogg.test_ogg; //要同步的表
extract pukafka
passthru
dynamicresolution
userid ogg,password 123456
rmthost 192.168.10.121 mgrport 7809
rmttrail /opt/ogg/dirdat/to
table test_ogg.test_ogg;
table test_ogg.ydjt_*;

说明:

  1. 第一行指定extract进程名称;
  2. passthru即禁止OGG与Oracle交互,我们这里使用pump逻辑传输,故禁止即可;
  3. dynamicresolution动态解析;
  4. userid ogg,password ogg即OGG连接Oracle数据库的帐号密码rmthost和mgrhost即目标端(kafka)OGG的mgr服务的地址以及监听端口;
  5. rmttrail即目标端trail文件存储位置以及名称。

分别将本地trail文件和目标端的trail文件绑定到extract进程:

目标端操作一次?

[root@linux121 ogg]# ggsci

GGSCI (linux121) 1> add extract pukafka,exttrailsource /opt/ogg/dirdat/to
EXTRACT added.

GGSCI (linux121) 2>  add rmttrail /opt/ogg/dirdat/to,extract pukafka
RMTTRAIL added.

配置define文件

Oracle与MySQL,Hadoop集群(HDFS,Hive,kafka等)等之间数据传输可以定义为异构数据类型的传输,故需要定义表之间的关系映射,在OGG命令行执行:

GGSCI (linux121) 3> edit param test_ogg

defsfile /opt/ogg/dirdef/ogg.test_ogg
userid ogg,password 123456
table test_ogg.test_ogg;

在OGG主目录下执行(oracle用户):

./defgen paramfile dirprm/test_ogg.prm

***********************************************************************
        Oracle GoldenGate Table Definition Generator for Oracle
 Version 11.2.1.0.3 14400833 OGGCORE_11.2.1.0.3_PLATFORMS_120823.1258
   Linux, x64, 64bit (optimized), Oracle 11g on Aug 23 2012 16:58:29
 
Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2018-05-23 05:03:04
***********************************************************************

Operating System Version:
Linux
Version #1 SMP Wed Apr 12 15:04:24 UTC 2017, Release 3.10.0-514.16.1.el7.x86_64
Node: ambari.master.com
Machine: x86_64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 13126

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
defsfile /opt/ogg/dirdef/test_ogg.test_ogg
userid ogg,password ***
table test_ogg.test_ogg;
Retrieving definition for TEST_OGG.TEST_OGG



Definitions generated for 1 table in /opt/ogg/dirdef/test_ogg.test_ogg

将生成的/opt/ogg/dirdef/test_ogg.test_ogg发送的目标端ogg目录下的dirdef里:

scp -r /opt/ogg/dirdef/test_ogg.test_ogg root@linux121:/opt/ogg/dirdef/ 

5.OGG目标端配置

5.1 开启kafka服务
[root@linux121 ~]# kafka-server-start.sh -daemon /opt/lagou/servers/kafka_2.12-1.0.2/config/server.properties
[root@linux121 ~]# ps aux|grep kafka
5.2 配置管理器mgr
GGSCI (linux121) 1>  edit param mgr

PORT 7809
DYNAMICPORTLIST 7810-7909
AUTORESTART EXTRACT *,RETRIES 5,WAITMINUTES 3
PURGEOLDEXTRACTS ./dirdat/*,usecheckpoints, minkeepdays 3
5.3 配置checkpoint

checkpoint即复制可追溯的一个偏移量记录,在全局配置里添加checkpoint表即可。

edit  param  ./GLOBALS
CHECKPOINTTABLE test_ogg.checkpoint
5.4 配置replicate进程
GGSCI (linux121) 4> edit param rekafka

REPLICAT rekafka
sourcedefs /opt/ogg/dirdef/test_ogg.test_ogg
TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP test_ogg.test_ogg, TARGET test_ogg.test_ogg;
MAP test_ogg.ydjt_*, TARGET test_ogg.ydjt_*;

说明:REPLICATE rekafka定义rep进程名称;sourcedefs即在4.6中在源服务器上做的表映射文件;TARGETDB LIBFILE即定义kafka一些适配性的库文件以及配置文件,配置文件位于OGG主目录下的dirprm/kafka.props;REPORTCOUNT即复制任务的报告生成频率;GROUPTRANSOPS为以事务传输时,事务合并的单位,减少IO操作;MAP即源端与目标端的映射关系

5.5 配置kafka.props
cd /opt/ogg/dirprm/
vim kafka.props
gg.handlerlist=kafkahandler //handler类型
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties //kafka相关配置
gg.handler.kafkahandler.topicMappingTemplate=test_ogg //kafka的topic名称,无需手动创建
gg.handler.kafkahandler.format=json //传输文件的格式,支持json,xml等
gg.handler.kafkahandler.mode=op  //OGG for Big Data中传输模式,即op为一次SQL传输一次,tx为一次事务传输一次
gg.classpath=dirprm/:/opt/kafka_2.11-1.1.0/libs/*:/opt/ogg/:/opt/ogg/lib/*
vim custom_kafka_producer.properties
bootstrap.servers=192.168.44.129:9092 //kafkabroker的地址
acks=1
compression.type=gzip //压缩类型
reconnect.backoff.ms=1000 //重连延时
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
batch.size=102400
linger.ms=10000

其中需要将后面的注释去掉,ogg不识别注释,如果不去掉会报错

5.6 添加trail文件到replicate进程
GGSCI (linux121) 2> add replicat rekafka exttrail /opt/ogg/dirdat/to,checkpointtable test_ogg.checkpoint
REPLICAT added.

6.测试

6.1 启动所有进程

在源端和目标端的OGG命令行下使用start [进程名]的形式启动所有进程。
启动顺序按照源mgr——目标mgr——源extract——源pump——目标replicate来完成。
全部需要在ogg目录下执行ggsci目录进入ogg命令行。
源端依次是

start mgr
start extkafka
start pukafka

目标端

start mgr
start rekafka

可以通过info all 或者info [进程名] 查看状态,所有的进程都为RUNNING才算成功

源端

GGSCI (db-oracle) 5> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXTKAFKA    04:50:21      00:00:03    
EXTRACT     RUNNING     PUKAFKA     00:00:00      00:00:03    

目标端

GGSCI (linux121) 3> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     REKAFKA     00:00:00      00:00:01   
6.2 异常解决

如果有不是RUNNING可通过查看日志的方法检查解决问题,具体通过下面两种方法

vim ggser.log

或者ogg命令行,以rekafka进程为例

GGSCI (linux121) 2> view report rekafka

列举其中我遇到的一个问题:
异常信息

SEVERE: Unable to set property on handler 'kafkahandler' (oracle.goldengate.handler.kafka.KafkaHandler). Failed to set property: TopicName:="test_ogg" (class: oracle.goldengate.handler.kafka.KafkaHandler).
oracle.goldengate.util.ConfigException: Failed to set property: TopicName:="test_ogg" (class: oracle.goldengate.handler.kafka.KafkaHandler).
at ......

具体原因是网上的教程是旧版的,设置topicName的属性为:

gg.handler.kafkahandler.topicName=test_ogg

新版的这样设置

gg.handler.kafkahandler.topicMappingTemplate=test_ogg

7.全量数据同步配置

参考:https://zhuanlan.zhihu.com/p/91583623

需要ogg提前初始化
数据同步初始化

1.配置源端数据初始化
GGSCI (db-oracle) 5> dblogin userid ogg
Password: 
Successfully logged into database.
-- 配置源端初始化进程
GGSCI (db-oracle as ogg@orcl) 6> add extract initkfk,sourceistable
-- 配置源端初始化参数
GGSCI (db-oracle as ogg@orcl) 7> edit params initkfk
EXTRACT initkfk
SETENV (NLS_LANG=AMERICAN_AMERICA.AL32UTF8)
USERID ogg,PASSWORD 123456
RMTHOST 192.168.10.121, MGRPORT 7809
RMTFILE ./dirdat/ekfk,maxfiles 999, megabytes 500
table ydjt.ydjt_ydtz_business_data;
GGSCI (db-oracle as ogg@orcl) 15>  edit param define_kfk
defsfile /opt/ogg/dirdef/define_kfk.txt
userid ogg,password 123456
table ydjt.ydjt_ydtz_business_data;
-- 执行
[oracle@db-oracle ogg]$./defgen paramfile dirprm/define_kfk.prm
-- 将此文件传输到目标段dirdef文件夹

scp /opt/ogg/dirdef/define_kfk.txt 192.168.10.121:/opt/ogg/dirdef/define_kfk.txt
2.配置目标端数据初始化进程
-- 配置目标端初始化进程

GGSCI (linux121) 3> ADD replicat initkfk,specialrun
GGSCI (linux121) 6> edit params initkfk
-- 添加
SPECIALRUN
end runtime
setenv(NLS_LANG="AMERICAN_AMERICA.AL32UTF8")
targetdb libfile libggjava.so set property=./dirprm/kafka.props
SOURCEDEFS ./dirdef/define_kfk.txt
EXTFILE ./dirdat/ekfk000000
reportcount every 1 minutes, rate
grouptransops 10000
map ydjt.ydjt_ydtz_business_data,target ydjt.ydjt_ydtz_business_data;

配置kafka相关参数

[root@linux121 ogg]# vi /opt/ogg/dirprm/kafka.props

gg.handlerlist=kafkahandler
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties
gg.handler.kafkahandler.topicMappingTemplate=test_ogg
gg.handler.kafkahandler.format=json
gg.handler.kafkahandler.mode=op
gg.classpath=dirprm/:/opt/lagou/servers/kafka_2.12-1.0.2/libs/*:/opt/ogg/:/opt/ogg/lib/*
启动
5.源端开启全量数据抽取
-- 源端

GGSCI (linux121) 1> start mgr
GGSCI (linux121) 2> start initkfk

6.目标端全量数据应用
-- 目标
GGSCI (linux121) 1> start mgr

[root@linux121 ogg]# pwd
/opt/ogg
[root@linux121 ogg]# ./replicat paramfile dirprm/initkfk.prm reportfile dirrpt/init01.rpt -p INITIALDATALOAD

kafka数据验证

kafka-console-consumer.sh --bootstrap-server 192.168.10.121:9092 --topic test_ogg --from-beginning
  • 0
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值