Debezium获取MySQL Binlog同步到kafka,sparkStreaming实现实时计算
版本信息:kafka(kafka_2.11-2.1.1)
zookeeper(zookeeper-3.4.10)
sprak(spark-2.1.0-bin-hadoop2.7)
Debezium(debezium-connector-mysql-0.9.5.Final-plugin.tar.gz)
Debezium下载地址
https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/0.9.5.Final/debezium-connector-mysql-0.9.5.Final-plugin.tar.gz
Debezium参考文档
https://debezium.io/docs/connectors/mysql/
第一步:安装zookeeper
https://my.oschina.net/jerval/blog/3057528
第二步:安装kafka
https://my.oschina.net/jerval/blog/3057502
第三步:安装Debezium插件
1.kafka路径下创建插件路径
mkdir -p /usr/local/share/kafka/plugins
2.解压debezium到kafka插件路径
tar -zxvf debezium-connector-mysql-0.9.5.Final-plugin.tar.gz -C /usr/local/share/kafka/plugins
3.修改MySQL配置
# vim /etc/my.cnf
[mysqld]
log-bin=mysql-bin #添加这一行就ok
binlog-format=ROW #选择row模式
4.添加binlog用户帐号并授权
CREATE USER binlog IDENTIFIED BY 'Binlog-123';
GRANT SELECT, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'binlog'@'%';
-- GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'binlog'@'%';
-- GRANT SELECT, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'binlog'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'binlog'@'%' ;
FLUSH PRIVILEGES;
5.kafka的config路径下创建mysql.properties(需创建inventory数据库)
//mysql.properties
name=inventory-connector
connector.class=io.debezium.connector.mysql.MySqlConnector
database.hostname=10.17.81.212
database.port=3306
database.user=binlog
database.password=Binlog-123
database.server.id=184054
database.server.name=fullfillment
database.whitelist=inventory
#table.whitelist
database.history.kafka.bootstrap.servers=10.17.81.211:9092
database.history.kafka.topic=dbhistory.fullfillment
include.schema.changes=true
参数含义:https://blog.csdn.net/lzufeng/article/details/81253170
原文链接:http://debezium.io/docs/connectors/mysql/#connector-properties
6.修改kafka-config下connect-standalone.properties文件:
config/connect-standalone.properties
#kafka节点
bootstrap.servers=host:9092
#Debezium插件地址
plugin.path=/usr/local/share/kafka/plugin
第四步:启动程序
1.启动Zookeeper
2.启动Kafka:
3.以独立模式启动kafka connect,此时debezium会对数据库中的每一个表创建一个topic,消费相应的topic,即可获取binlog解析信息。
常用命令:
//启动kafka connect
nohup bin/connect-standalone.sh config/connect-standalone.properties config/mysql.properties &
//查看topic列表
bin/kafka-topics.sh --list --zookeeper host:2181
//消费测试主题
bin/kafka-console-consumer.sh --bootstrap-server host:9092 --topic test --from-beginning
//消费dbhistory.fullfillment主题
bin/kafka-console-consumer.sh --bootstrap-server host:9092 --topic dbhistory.fullfillment --from-beginning
//消费fullfillment主题
bin/kafka-console-consumer.sh --bootstrap-server host:9092 --topic fullfillment --from-beginning
//消费fullfillment.inventory.TableName1主题(serverName.databaseName.tableName)
bin/kafka-console-consumer.sh --bootstrap-server host:9092 --topic fullfillment.inventory.TableName1 --from-beginning
效果图:往inventory数据库的表添加数据,开启kafka消费者可看见日志信息
Kafka整合spark实现实时计算,将数据保存在mysql
(实现过程遇到各种坑,无非就是两种:版本不匹配或缺少jar包)
主程序:
import java.sql.{Connection, DriverManager, PreparedStatement}
import net.minidev.json.JSONObject
import net.minidev.json.parser.JSONParser
import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Hello world!
*
*/
object AppMain {
def main(args: Array[String]): Unit = {
// 减少日志输出
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
//System.setProperty("hadoop.home.dir", "D:\\env\\hadoop-2.7.3")
val sparkConf = new SparkConf().setAppName("KafkaReceiver")//.setMaster("local[2]")
val sparkStreaming = new StreamingContext(sparkConf,Seconds(3))
// 创建topic名称,1表示一次从这个topic中获取一条记录
val topics = Map("fullfillmen.inventory.result" -> 1)
// 创建Kafka的输入流,制定ZooKeeper地址
val kafkaStream = KafkaUtils.createStream(sparkStreaming, "192.168.224.11:2181,192.168.224.12:2181,192.168.224.13:2181", "result", topics)
//val kafkaStream = KafkaUtils.createStream(sparkStreaming, "182.151.24.83:2181", "fullfillmen.inventory.result", topics)
// 处理每次接受的数据
val lineDStream = kafkaStream.map(e => {
new String(e.toString())
})
lineDStream.print()
// kafka接收的数据是<key,value>形式,key为null,取出value
val logRDD = kafkaStream.map(_._2);
// 对接受的数据进行处理
val datas = logRDD.map(line => {
val jsonParser = new JSONParser()
val jsonObj: JSONObject = jsonParser.parse(line).asInstanceOf[JSONObject]
val payload = jsonObj.get("payload").toString
val payloadjsonObj: JSONObject = jsonParser.parse(payload).asInstanceOf[JSONObject]
val after = payloadjsonObj.get("after").toString
val aftersonObj: JSONObject = jsonParser.parse(after).asInstanceOf[JSONObject]
val ip = aftersonObj.get("ip").toString
val total = aftersonObj.get("total").toString
// 返回
(ip,total)
})
// 将数据保存在mysql数据库
datas.foreachRDD(cs => {
var conn: Connection = null;
var ps: PreparedStatement = null;
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
cs.foreachPartition(f => {
conn = DriverManager.getConnection("jdbc:mysql://192.168.224.11:3306/inventory?useUnicode=true&characterEncoding=utf8", "root", "root");
ps = conn.prepareStatement("insert into result1 values(?,?)");
f.foreach(s => {
ps.setString(1, s._1);
ps.setString(2, s._2);
ps.executeUpdate();
})
})
} catch {
case t: Throwable => t.printStackTrace() // TODO: handle error
} finally {
if (ps != null) {
ps.close()
}
if (conn != null) {
conn.close();
}
}
})
sparkStreaming.start()
sparkStreaming.awaitTermination()
}
}
Pom文件:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<spark.version>2.2.1</spark.version>
<scala.version>2.11.1</scala.version>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.8.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.11.0.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.11.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.8</version>
</dependency>
<!-- https://mvnrepository.com/artifact/net.minidev/json-smart -->
<dependency>
<groupId>net.minidev</groupId>
<artifactId>json-smart</artifactId>
<version>2.3</version>
</dependency>
</dependencies>
以下为遇到错误解决方式
```handlebars
Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
at org.apache.spark.streaming.kafka.KafkaReceiver.<init>(KafkaInputDStream.scala:76)
at org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
at org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:438)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:437)
at org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:159)
at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:101)
at org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:590)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.ThreadUtils$$anon$2.run(ThreadUtils.scala:219)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
启动日志不存在 ,每个节点添加spark-core_2.11-1.5.2.logging.jar
spark streaming 读 kafka 报错
java.lang.NoClassDefFoundError: org/I0Itec/zkclient/IZkStateListener
java.lang.NoClassDefFoundError: org/I0Itec/zkclient/IZkStateListener
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:109)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ClassNotFoundException: org.I0Itec.zkclient.IZkStateListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 14 more
添加下面的依赖解决:
<!-- https://mvnrepository.com/artifact/com.101tec/zkclient -->
<dependency>
<groupId>com.101tec</groupId>
<artifactId>zkclient</artifactId>
<version>0.10</version>
</dependency>
【异常】NoClassDefFoundError: com/yammer/metrics/Metrics
pom添加:
<dependency>
<groupId>com.yammer.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>2.2.0</version>
</dependency>
org.apache.kafka.connect.errors.ConnectException: Access denied; you need (at least one of) the RELOAD privilege(s) for this operation Error code: 1227; SQLSTATE: 42000.
at io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)
at io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:208)
at io.debezium.connector.mysql.SnapshotReader.execute(SnapshotReader.java:721)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLSyntaxErrorException: Access denied; you need (at least one of) the RELOAD privilege(s) for this operation
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:782)
at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:666)
at io.debezium.jdbc.JdbcConnection.executeWithoutCommitting(JdbcConnection.java:1121)
at io.debezium.connector.mysql.SnapshotReader.execute(SnapshotReader.java:361)
... 3 more
GRANT ALL PRIVILEGES ON *.* TO 'binlog'@'%' ; mysql授权
The db history topic is missing 修改database.history.kafka.topic