整合Spark Streaming与Kafka

1.Direct DStream(No Receivers)

Spark 1.3中引入了这种新的无接收方“直接”方法,以确保更强的端到端保证。这种方法不使用接收者来接收数据,而是定期查询Kafka在每个主题+分区中的最新偏移量,并相应地定义每个批处理中的偏移范围。启动处理数据的作业时,Kafka的简单消费者API用于从Kafka读取已定义的偏移范围(类似于从文件系统读取文件)。请注意,该特性是在针对Scala和Java API的Spark 1.3中引入的,在针对Python API的Spark 1.4中引入的。

This new receiver-less “direct” approach has been introduced in Spark 1.3 to ensure stronger end-to-end guarantees. Instead of using receivers to receive data, this approach periodically queries Kafka for the latest offsets in each topic+partition, and accordingly defines the offset ranges to process in each batch. When the jobs to process the data are launched, Kafka’s simple consumer API is used to read the defined ranges of offsets from Kafka (similar to read files from a file system). Note that this feature was introduced in Spark 1.3 for the Scala and Java API, in Spark 1.4 for the Python API.

1.1.Kafka

开Zookeeper:

./zkServer.sh start

开Kafka:

./kafka-server-start.sh -daemon /home/hadoop/app/kafka_2.11-0.9.0.0/config/server.properties

创建一个Kafka topic:

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafka_streaming_topic

创建Producer与Consumer测试一下连通性:

kafka-console-producer.sh --broker-list hadoop000:9092 --topic tp_kafka_streaming_topic
kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic tp_kafka_streaming_to
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值