实时数据同步

方案比较

log解析SQL查询
全库同步
影响生产库
同步约束条件只能按数字型timestamp/主键id增量同步 (若直接用timestamp,需要修改源码)

log解析=>数据同步

  • Oracle:ogg
  • MySQL:binlog解析

SQL查询=>数据同步

  • streamsets
  • flume
    • 方案

    • 实施

      • 安装flume[, kafka, zookepper]
      • 下载并生成flume-ng-sql-sourcejar
        • 前提:java and maven
        • 进入flume-ng-sql-source并执行mvn package
      • copy flume-ng-sql-source-{version}.jarflumelib
        • flume
          • mkdir -p $FLUME_HOME/plugins.d/sql-source/lib
          • mkdir -p $FLUME_HOME/plugins.d/sql-source/libext
          • cp flume-ng-sql-source-0.8.jar $FLUME_HOME/plugins.d/sql-source/lib
        • cdh
          • cp/scp jar到所有节点
            cp flume-ng-sql-source-1.5.2.jar /opt/cloudera/parcels/CDH-{version}-1.cdh{version}/lib/flume-ng/lib
      • cp db driver
        • db driver
          • download
          • cdh
            • hive server/usr/share/java
            • 其它应用:/data/cloudera/parcels/{新增应用}/lib/{新增应用}/jars
        • flume: cp mysql-connector-java-5.1.35-bin.jar $FLUME_HOME/plugins.d/sql-source/libext
        • cdh: cp /path/mysql-connector-java-5.1.44-bin.jar /opt/cloudera/parcels/CDH-{version}.cdh{version}/lib/flume-ng/lib
    • 配置flumeconf文件

      • 位置
        • flume$FLUME_HOME
        • cdh/opt/cloudera/parcels/CDH-{version}.cdh{version}/etc/flume-ng/conf.empty
      • conf文件
        • MySQL按数字型id同步hdfsflume_mysql_hdfs.conf
          agentmysqlhdfsTest.channels = mysqlhdfschannelTest
          agentmysqlhdfsTest.sources = mysqlhdfssourceTest
          agentmysqlhdfsTest.sinks = mysqhdfslsinkTest
          ###########sql source#################
          # For each Test of the sources, the type is defined
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.type = org.keedio.flume.source.SQLSource
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.url =  jdbc:mysql://{ip}:{port}/{db}
          # Hibernate Database connection properties
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.user = {username}
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.password = {pwd}
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.autocommit = true
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.dialect = org.hibernate.dialect.MySQLDialect
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.driver_class = com.mysql.jdbc.Driver
          # Query delay, each configured milisecond the query will be sent:毫秒级
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.run.query.delay=2000
          # Status file is used to save last readed row:跟踪增量的偏移量
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.status.file.path = /tmp/flume
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.status.file.name = agentmysqlhdfsTest.sqlSource.status
          # Custom query
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.start.from = 0
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.custom.query = select avgShowView, avgViewBox, boxInfo, cinemaId, cinemaName,viewInfo, date from maoyan_test WHERE id > $@$
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.batch.size = 6000
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.max.rows = 1000
          # agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
          # agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.c3p0.min_size=1
          # agentmysqlhdfsTest.sources.mysqlhdfssourceTest.hibernate.c3p0.max_size=10
          ##############################
          agentmysqlhdfsTest.channels.mysqlhdfschannelTest.type = memory
          agentmysqlhdfsTest.channels.mysqlhdfschannelTest.capacity = 10000
          agentmysqlhdfsTest.channels.mysqlhdfschannelTest.transactionCapacity = 10000
          agentmysqlhdfsTest.channels.mysqlhdfschannelTest.byteCapacityBufferPercentage = 20
          agentmysqlhdfsTest.channels.mysqlhdfschannelTest.byteCapacity = 1600000
          
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.type = hdfs
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.path = hdfs://nameservice1/test/xiw_su/maoyan
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.fileType = DataStream
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.writeFormat = Text
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.rollSize = 268435456
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.rollInterval = 0
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.hdfs.rollCount = 0
          
          agentmysqlhdfsTest.sinks.mysqhdfslsinkTest.channel = mysqlhdfschannelTest
          agentmysqlhdfsTest.sources.mysqlhdfssourceTest.channels=mysqlhdfschannelTest
          
        • Oracle按数字型id同步kafkaflume_oracle_test.conf
          agentoracleTest.channels = channelTest
          agentoracleTest.sources = sourceTest
          agentoracleTest.sinks = sinkTest
          ###########sql source#################
          # For each Test of the sources, the type is defined
          agentoracleTest.sources.sourceTest.type = org.keedio.flume.source.SQLSource
          agentoracleTest.sources.sourceTest.hibernate.connection.url = jdbc:oracle:thin:@{ip}:{port}/{db}
          # Hibernate Database connection properties
          agentoracleTest.sources.sourceTest.hibernate.connection.user = {username}
          agentoracleTest.sources.sourceTest.hibernate.connection.password = {pwd}
          agentoracleTest.sources.sourceTest.hibernate.connection.autocommit = true
          agentoracleTest.sources.sourceTest.hibernate.dialect = org.hibernate.dialect.Oracle10gDialect
          agentoracleTest.sources.sourceTest.hibernate.connection.driver_class = oracle.jdbc.driver.OracleDriver
          agentoracleTest.sources.sourceTest.run.query.delay=1
          agentoracleTest.sources.sourceTest.status.file.path = /tmp/flume
          agentoracleTest.sources.sourceTest.status.file.name = agentoracleTest.sqlSource.status
          # Custom query
          agentoracleTest.sources.sourceTest.start.from = 0            
          agentoracleTest.sources.sourceTest.custom.query = SELECT so_no, cust_no, acc_name, acc_no FROM tmp_xws_test_2 WHERE id > $@$
          
          agentoracleTest.sources.sourceTest.batch.size = 6000
          agentoracleTest.sources.sourceTest.max.rows = 1000
          agentoracleTest.sources.sourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
          agentoracleTest.sources.sourceTest.hibernate.c3p0.min_size=1
          agentoracleTest.sources.sourceTest.hibernate.c3p0.max_size=10
          ##############################
          agentoracleTest.channels.channelTest.type = memory
          agentoracleTest.channels.channelTest.capacity = 10000
          agentoracleTest.channels.channelTest.transactionCapacity = 10000
          agentoracleTest.channels.channelTest.byteCapacityBufferPercentage = 20
          agentoracleTest.channels.channelTest.byteCapacity = 1600000
          
          agentoracleTest.sinks.sinkTest.type = org.apache.flume.sink.kafka.KafkaSink
          agentoracleTest.sinks.sinkTest.topic = OracleTestTopic
          agentoracleTest.sinks.sinkTest.brokerList = {ip}:9092
          agentoracleTest.sinks.sinkTest.requiredAcks = 1
          agentoracleTest.sinks.sinkTest.batchSize = 20
          agentoracleTest.sinks.sinkTest.channel = channelTest
          
          agentoracleTest.sinks.sinkTest.channel = channelTest
          agentoracleTest.sources.sourceTest.channels=channelTest
          
    • 运行

      • to hdfsflume-ng agent --conf conf --conf-file /{path}/flume_mysql_hdfs.conf --name agentmysqlhdfsTest -Dflume.root.logger=INFO,console
      • to kafka
        • flume-ngflume-ng agent --conf conf --conf-file /{path}/flume_mysql_test.conf --name agentmysqlTest [-Dflume.root.logger=INFO,console]
        • kafka跟踪:kafka-console-consumer --zookeeper ip:2181 --topic MysqlTestTopic [--from-beginning]
      • kafka基本操作
        • 查看topic列表:kafka-topics --zookeeper ip:2181 --list
        • 手动创建topickafka-topics --zookeeper ip:2181 --create --topic {topic_name} --partitions {partitions_number} --replication-factor {replication_number}
        • 删除topickafka-topics --delete --zookeeper ip:2181 --topic {topic_name}
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值