CC00012.bdpositions——|Hadoop&实时数仓.V12|——|项目.v12|ODS层处理|JSON日志数据格式|

本文介绍了MySQL通过Canal将数据实时同步到Kafka,然后解析Kafka日志写入HBase的过程。在数据流转过程中,由于MySQL的增删改操作会在Kafka中留下历史记录,可能导致数据不一致。为解决这个问题,提出了利用HBase的多版本特性来保持数据一致性。同时,讨论了数据同步中可能遇到的问题及解决方案。
摘要由CSDN通过智能技术生成
一、json日志数据格式
### --- 样例sql

~~~     # 在mysql中插入数
mysql> INSERT INTO `yanqi_trade_orders` VALUES ('2', '23a0b124546', '121', '2', '0.12', '6331.00', '2', '0', '370203', '0', '0', '0', '1', '2020-06-28 16:55:02', '2020-06-28 16:55:02', '2020-10-21 22:54:32');
### --- kafka中收到Canal的消息 {"data":[{"productId":"115908","productName":"索尼
                             
~~~     # 在kafka消费者主题下查看采集到的数据
[root@hadoop02 ~]# kafka-console-consumer.sh --zookeeper hadoop02:2181/myKafka --topic canal --from-beginning
 ~~~    # 在mysql下insert插入操作采集到数据
{"data":[{"orderId":"2","orderNo":"23a0b124546","userId":"121","status":"2","productMoney":"0.12","totalMoney":"6331.0","payMethod":"2","isPay":"0","areaId":"370203","tradeSrc":"0","tradeType":"0","isRefund":"0","dataFlag":"1","createTime":"2020-06-28 16:55:02","payTime":"2020-06-28 16:55:02","modifiedTime":"2020-10-21 22:54:32"}],"database":"dwshow","es":1638097808000,"id":3,"isDdl":false,"mysqlType":{"orderId":"bigint(11)","orderNo":"varchar(20)","userId":"bigint(11)","status":"tinyint(4)","productMoney":"decimal(11,2)","totalMoney":"decimal(11,2)","payMethod":"tinyint(4)","isPay":"tinyint(4)","areaId":"int(11)","tradeSrc":"tinyint(4)","tradeType":"int(11)","isRefund":"tinyint(4)","dataFlag":"tinyint(4)","createTime":"varchar(25)","payTime":"varchar(25)","modifiedTime":"timestamp"},"old":null,"pkNames":["orderId"],"sql":"","sqlType":{"orderId":-5,"orderNo":12,"userId":-5,"status":-6,"productMoney":3,"totalMoney":3,"payMethod":-6,"isPay":-6,"areaId":4,"tradeSrc":-6,"tradeType":4,"isRefund":-6,"dataFlag":-6,"createTime":12,"payTime":12,"modifiedTime":93},"table":"yanqi_trade_orders","ts":1638097808750,"type":"INSERT"}
{"data":[{"orderId":"3","orderNo":"23a0b124546","userId":"35","status":"2","productMoney":"0.12","totalMoney":"1987.50","payMethod":"4","isPay":"0","areaId":"370203","tradeSrc":"0","tradeType":"0","isRefund":"0","dataFlag":"1","createTime":"2020-06-28 12:07:01","payTime":"2020-06-28 12:07:01","modifiedTime":"2020-10-21 22:54:34"}],"database":"dwshow","es":1638098012000,"id":4,"isDdl":false,"mysqlType":{"orderId":"bigint(11)","orderNo":"varchar(20)","userId":"bigint(11)","status":"tinyint(4)","productMoney":"decimal(11,2)","totalMoney":"decimal(11,2)","payMethod":"tinyint(4)","isPay":"tinyint(4)","areaId":"int(11)","tradeSrc":"tinyint(4)","tradeType":"int(11)","isRefund":"tinyint(4)","dataFlag":"tinyint(4)","createTime":"varchar(25)","payTime":"varchar(25)","modifiedTime":"timestamp"},"old":null,"pkNames":["orderId"],"sql":"","sqlType":{"orderId":-5,"orderNo":12,"userId":-5,"status":-6,"productMoney":3,"totalMoney":3,"payMethod":-6,"isPay":-6,"areaId":4,"tradeSrc":-6,"tradeType":4,"isRefund":-6,"dataFlag":-6,"createTime":12,"payTime":12,"modifiedTime":93},"table":"yanqi_trade_orders","ts":1638098012708,"type":"INSERT"}
### --- 注意:
~~~     Canal将MySQL数据库所有的表数据都写入Kafka的同一个分区中,
~~~     可以通过json文件中的table属性来获取表名。
~~~     只要对数据表进行insert、update、delete操作,
~~~     就会产生相应的日志,并且产生的日志就会通过Canal写入Kafka分区。
~~~     这样就会有一个问题,当我们向某张表中添加数据,然后对其进行修改和删除操作,
~~~     那么就会产生三条相应的日志,并且这三条日志都会写入Kafka分区。
~~~     此时,数据表中已经删除了这条数据,但是Kafka中没有删除。
~~~     这样就会造成Kafka与MySQL中数据不一致的情况。

~~~     解决上述问题可以通过解析Kafka中的json日志将数据写入HBase中,
~~~     因为HBase中可以有多版本的数据。
~~~     地域:省、市、区
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

yanqi_vip

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值