1、数据准备
使用canal将mysql binlog的数据发送到kafka中
2、程序编写
1、消费kafka中的binlog数据
val kafkaParams = Map[String, String](
"bootstrap.servers" -> "xxx.xxx.xxx.xxx:9092",
"auto.offset.reset" -> "latest",
"key.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
"group.id" -> "test"
)
val topics = Array("topic_xxx")
val conf = new SparkConf()
.setAppName("Demo1")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
val ssc = new StreamingContext(conf, Seconds(60))
val stream = KafkaUtils.createDirectStream[String, String](ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topics, kafkaParams))
2、解析并过滤数据
我们kafka中binlog的数据格式如下:
V_BinlogName|#|mysql-bin.001205|*|V_StartPos|#|111404|*|V_EndPos|#|113489|