方案比较
log解析 | SQL查询 | |
---|---|---|
全库同步 | 是 | 否 |
影响生产库 | 否 | 是 |
同步约束条件 | 否 | 只能按数字型timestamp /主键id 增量同步 (若直接用timestamp ,需要修改源码) |
- 两个方案的demo
- 方案三:Flink CDC
log解析=>数据同步
- Oracle:ogg
- MySQL:binlog解析
SQL查询=>数据同步
- streamsets
- flume
-
方案
- flume-ng-sql-source读取数据:DB,file, hdfs…
- flume写入DB,hdfs
- flume写入kafka后期再消费
- docker试验
-
实施
- 安装
flume
[,kafka
,zookepper
] - 下载并生成
flume-ng-sql-source
的jar
包- 前提:
java
andmaven
- 进入
flume-ng-sql-source
并执行mvn package
- 前提:
- copy
flume-ng-sql-source-{version}.jar
到flume
是lib
下flume
mkdir -p $FLUME_HOME/plugins.d/sql-source/lib
mkdir -p $FLUME_HOME/plugins.d/sql-source/libext
cp flume-ng-sql-source-0.8.jar $FLUME_HOME/plugins.d/sql-source/lib
cdh
- cp/scp jar到所有节点
cp flume-ng-sql-source-1.5.2.jar /opt/cloudera/parcels/CDH-{version}-1.cdh{version}/lib/flume-ng/lib
- cp/scp jar到所有节点
- cp db driver
- db driver
- download
cdh
:hive server
下/usr/share/java
- 其它应用:
/data/cloudera/parcels/{新增应用}/lib/{新增应用}/jars
flume
:cp mysql-connector-java-5.1.35-bin.jar $FLUME_HOME/plugins.d/sql-source/libext
cdh
:cp /path/mysql-connector-java-5.1.44-bin.jar /opt/cloudera/parcels/CDH-{version}.cdh{version}/lib/flume-ng/lib
- db driver
- 安装
-
配置
flume
的conf
文件- 位置
flume
:$FLUME_HOME
cdh
:/opt/cloudera/parcels/CDH-{version}.cdh{version}/etc/flume-ng/conf.empty
conf
文件MySQL
按数字型id
同步hdfs
:flume_mysql_hdfs.conf
agentmysqlhdfsTest.channels = mysqlhdfschannelTest agentmysqlhdfsTest.sources = mysqlhdfssourceTest agentmysqlhdfsTest.sinks = mysqhdfslsinkTest ###########sql source################# # For each Test of the sources, the type is defined agentmysqlhdfsTest.sources.mysqlhdfssourceTest.type = org.keedio.flume.source.SQLSource agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.url = jdbc:mysql://{ip}:{port}/{db} # Hibernate Database connection properties agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.user = {username} agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.password = {pwd} agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.autocommit = true agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.dialect = org.hibernate.dialect.MySQLDialect agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.driver_class = com.mysql.jdbc.Driver # Query delay, each configured milisecond the query will be sent:毫秒级 agentmysqlhdfsTest.sources.mysqlhdfssourceTest.run.query.delay=2000 # Status file is used to save last readed row:跟踪增量的偏移量 agentmysqlhdfsTest.sources.mysqlhdfssourceTest.status.file.path = /tmp/flume agentmysqlhdfsTest.sources.mysqlhdfssourceTest.status.file.name = agentmysqlhdfsTest.sqlSource.status # Custom query agentmysqlhdfsTest.sources.mysqlhdfssourceTest.start.from = 0 agentmysqlhdfsTest.sources.mysqlhdfssourceTest.custom.query = select avgShowView, avgViewBox, boxInfo, cinemaId, cinemaName,viewInfo, date from maoyan_test WHERE id > $@$ agentmysqlhdfsTest.sources.mysqlhdfssourceTest.batch.size = 6000 agentmysqlhdfsTest.sources.mysqlhdfssourceTest.max.rows = 1000 # agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider # agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.c3p0.min_size=1 # agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.c3p0.max_size=10 ############################## agentmysqlhdfsTest.channels.mysqlhdfschannelTest.type = memory agentmysqlhdfsTest.channels.mysqlhdfschannelTest.capacity = 10000 agentmysqlhdfsTest.channels.mysqlhdfschannelTest.transactionCapacity = 10000 agentmysqlhdfsTest.channels.mysqlhdfschannelTest.byteCapacityBufferPercentage = 20 agentmysqlhdfsTest.channels.mysqlhdfschannelTest.byteCapacity = 1600000 agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.type = hdfs agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.path = hdfs://nameservice1/test/xiw_su/maoyan agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.fileType = DataStream agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.writeFormat = Text agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.rollSize = 268435456 agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.rollInterval = 0 agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.rollCount = 0 agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.channel = mysqlhdfschannelTest agentmysqlhdfsTest.sources.mysqlhdfssourceTest.channels=mysqlhdfschannelTest
Oracle
按数字型id
同步kafka
:flume_oracle_test.conf
agentoracleTest.channels = channelTest agentoracleTest.sources = sourceTest agentoracleTest.sinks = sinkTest ###########sql source################# # For each Test of the sources, the type is defined agentoracleTest.sources.sourceTest.type = org.keedio.flume.source.SQLSource agentoracleTest.sources.sourceTest.hibernate.connection.url = jdbc:oracle:thin:@{ip}:{port}/{db} # Hibernate Database connection properties agentoracleTest.sources.sourceTest.hibernate.connection.user = {username} agentoracleTest.sources.sourceTest.hibernate.connection.password = {pwd} agentoracleTest.sources.sourceTest.hibernate.connection.autocommit = true agentoracleTest.sources.sourceTest.hibernate.dialect = org.hibernate.dialect.Oracle10gDialect agentoracleTest.sources.sourceTest.hibernate.connection.driver_class = oracle.jdbc.driver.OracleDriver agentoracleTest.sources.sourceTest.run.query.delay=1 agentoracleTest.sources.sourceTest.status.file.path = /tmp/flume agentoracleTest.sources.sourceTest.status.file.name = agentoracleTest.sqlSource.status # Custom query agentoracleTest.sources.sourceTest.start.from = 0 agentoracleTest.sources.sourceTest.custom.query = SELECT so_no, cust_no, acc_name, acc_no FROM tmp_xws_test_2 WHERE id > $@$ agentoracleTest.sources.sourceTest.batch.size = 6000 agentoracleTest.sources.sourceTest.max.rows = 1000 agentoracleTest.sources.sourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider agentoracleTest.sources.sourceTest.hibernate.c3p0.min_size=1 agentoracleTest.sources.sourceTest.hibernate.c3p0.max_size=10 ############################## agentoracleTest.channels.channelTest.type = memory agentoracleTest.channels.channelTest.capacity = 10000 agentoracleTest.channels.channelTest.transactionCapacity = 10000 agentoracleTest.channels.channelTest.byteCapacityBufferPercentage = 20 agentoracleTest.channels.channelTest.byteCapacity = 1600000 agentoracleTest.sinks.sinkTest.type = org.apache.flume.sink.kafka.KafkaSink agentoracleTest.sinks.sinkTest.topic = OracleTestTopic agentoracleTest.sinks.sinkTest.brokerList = {ip}:9092 agentoracleTest.sinks.sinkTest.requiredAcks = 1 agentoracleTest.sinks.sinkTest.batchSize = 20 agentoracleTest.sinks.sinkTest.channel = channelTest agentoracleTest.sinks.sinkTest.channel = channelTest agentoracleTest.sources.sourceTest.channels=channelTest
- 位置
-
运行
to hdfs
:flume-ng agent --conf conf --conf-file /{path}/flume_mysql_hdfs.conf --name agentmysqlhdfsTest -Dflume.root.logger=INFO,console
to kafka
flume-ng
:flume-ng agent --conf conf --conf-file /{path}/flume_mysql_test.conf --name agentmysqlTest [-Dflume.root.logger=INFO,console]
kafka
跟踪:kafka-console-consumer --zookeeper ip:2181 --topic MysqlTestTopic [--from-beginning]
kafka
基本操作- 查看
topic
列表:kafka-topics --zookeeper ip:2181 --list
- 手动创建
topic
:kafka-topics --zookeeper ip:2181 --create --topic {topic_name} --partitions {partitions_number} --replication-factor {replication_number}
- 删除
topic
:kafka-topics --delete --zookeeper ip:2181 --topic {topic_name}
- 查看
-