墨墨导读:本文来自墨天轮用户 肖杰 的投稿,介绍用OGG实现Oracle到Kafka到Greenplum的增量数据同步的全过程。
墨天轮主页:https://www.modb.pro/u/6722
背景
在大数据库时代,数据经常需要在不同的数据库之间流动、整合,并要求具有一定的实时性,传统的通过脚本定时,批量同步的方式根本无法满足需求。
本文基于Oracle OGG,Kafka消息队列实现Oracle到Greenplum之间的准实时同步(实测延时在ms级别)。
一、环境准备
– | 版本 | OGG版本 | IP |
---|---|---|---|
源端 | oracle 12.2.0.1 | 123012_fbo_ggs_Linux_x64_shiphome.zip | 192.168.11.151 |
目标端 | kafka 2.12 | OGG_BigData_Linux_x64_12.3.2.1.1.zip | 192.168.11.165/4/3 |
二、ORACLE,GREENPLUM,OGG安装
此过程所用方法比较通用,此处不再赘述。
三、源库配置
源库需要开启归档,配置ogg用户,表空间及相关参数等,此处不再详细赘述
四、源端OGG配置
1. 进程配置
mgr.prm
PORT 7809
extn.prm
extract extn
userid C##GGADMIN@ora12c,password GGadmin_123
DISCARDFILE ./dirrpt/extn.dsc, APPEND, MEGABYTES 1024
EXTTRAIL ./dirdat/na
GETUPDATEBEFORES
NOCOMPRESSUPDATES
NOCOMPRESSDELETES
DYNAMICRESOLUTION
--源库是一个CDB,本次测试抽取pdb_test下面的两张表
table pdb_test.test.ccps_traderecord;
table pdb_test.test.ccps_unnormal_traderecord;
dpen.prm
extract dpen
RMTHOST 192.168.11.165, MGRPORT 7809
RMTTRAIL ./dirdat/na
DYNAMICRESOLUTION
TABLE pdb_test.test.*;
2. 创建进程
add extract extn, TRANLOG, BEGIN NOW
add exttrail ./dirdat/na, EXTRACT extn, MEGABYTES 200
ADD EXTRACT dpen, EXTTRAILSOURCE ./dirdat/na
ADD RMTTRAIL ./dirdat/na, EXTRACT dpen, MEGABYTES 200
五、目标端OGG配置
1. 配置JDBC Hander参数
示例目录下面有配置模板,复制过来即可
cp /home/oracle/ogg/AdapterExamples/big-data/kafka/* /home/oracle/ogg/dirprm
custom_kafka_producer.properties:
bootstrap.servers=localhost:9092
acks=1
reconnect.backoff.ms=1000
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
# 100KB per partition
batch.size=16384
linger.ms=0
max.request.size = 5024000
send.buffer.bytes = 5024000
kafka.props
gg.handlerlist = kafkahandler
gg.handler.kafkahandler.type=kafka
gg.handler.kafkahandler.KafkaProducerConfigFile=custom_kafka_producer.properties
#The following resolves the topic name using the short table name
gg.handler.kafkahandler.topicMappingTemplate=oggtopic
#The following selects the message key using the concatenated primary keys
#gg.handler.kafkahandler.keyMappingTemplate=${primaryKeys}
gg.handler.kafkahandler.format=json
gg.handler.kafkahandler.SchemaTopicName=mySchemaTopic
gg.handler.kafkahandler.BlockingSend =false
gg.handler.kafkahandler.includeTokens=false
gg.handler.kafkahandler.mode=tx
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE
gg.log=log4j
gg.log.level=INFO
gg.report.time=30sec
#Sample gg.classpath for Apache Kafka
gg.classpath=dirprm/:/home/oracle/kafka/libs/*
#Sample gg.classpath for HDP
#gg.classpath=/etc/kafka/conf:/usr/hdp/current/kafka-broker/libs/*
javawriter.bootoptions=-Xmx512m -Xms512m -Djava.class.path=ggjava/ggjava.jar
2. 进程配置
mgr.prm
port 7809
repn.prm
replicat repn
getenv(JAVA_HOME)
getenv(CLASSPATH)
getenv(PATH)
getenv(LD_LIBRARY_PATH)
SETENV (NLS_LANG="AMERICAN_AMERICA.AL32UTF8")
TARGETDB LIBFILE libggjava.so SET property=./dirprm/kafka.props
SOURCEDEFS /home/oracle/ogg/dirdef/source.def
REPORTCOUNT EVERY 1 MINUTES, RATE
--GROUPTRANSOPS 10000
MAP pdb_test.test.ccps_traderecord, TARGET ccps.ccps_traderecord;
MAP pdb_test.test.ccps_unnormal_traderecord, TARGET ccps.ccps_unnormal_traderecord;
3. 创建进程
add replicat repn, exttrail ./dirdat/na
六、目标端kafka安装配置
1. 安装
解压即可(此版本kafka本身已经集成zookeeper,不需要单独安装。)
2. 配置
因为是测试环境,此处列出的都是一些关键性参数,至于其他性能相关参数根据自己需要调整即可。
consumer.properties
bootstrap.servers=localhost:9092
server.properties
listeners=PLAINTEXT://localhost:9092
zookeeper.connect=localhost:2181
zookeeper.properties
clientPort=2181
3. 启动zookeeper
[oracle@gpmaster kafka]$ bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
[oracle@gpmaster kafka]$ jps
18001 Jps
2505 QuorumPeerMain
4. 启动kafka