Kafka
Kafka是一种高吞吐量的分布式发布订阅消息系统。
相关概念:
-
Broker:Kafka集群上的服务器
-
Topic:发布到集群上的消息类别
-
Partition:物理上的分区
-
Producer:负责发布消息到集群
-
Consumer:消息消费者
Kafka配置
创建一个主题
$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092
查看当前主题
$ bin/kafka-topics.sh --describe --topic quickstart-events --bootstrap-server localhost:9092
Topic: quickstart-events TopicId: Pwr-0JOyQKOr9qBXlGUTuA PartitionCount: 1 ReplicationFactor: 1 Configs:
Topic: quickstart-events Partition: 0 Leader: 0 Replicas: 0 Isr: 0
查看所有主题
$ bin/kafka-topics.sh --bootstrap-server localhost:9092 --list
将事件写入主题
$ ./bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
>This is my first event
>This is my second event
阅读事件
$ ./bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
This is my first event
This is my second event
使用Spark Streaming消费Kafka数据
package ac.sict.reid.leo.streaming
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
object SparkStreaming04_Kafka {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("SparkStreaming")
val ssc = new StreamingContext(conf = conf, batchDuration = Seconds(3))
val para = Map[String, Object](
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> "localhost:9092",
ConsumerConfig.GROUP_ID_CONFIG -> "quickstart-events",
"key.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
)
val consumerRecord = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](Set("quickstart-events"), kafkaParams = para)
)
consumerRecord.map(_.value()).print()
ssc.start()
ssc.awaitTermination()
}
}