sparkstreaming动态感知kafka分区
0.8版本kafka
0.8版本kafka需要粘贴DirectKafkaInputDStream
类并重写,比较麻烦。
实现和部署可以参考一下链接
代码实现
部署
1.0版本kafka
sparkstreaming整合1.0版本的kafka天然支持动态感知kafka分区不用特殊处理
如下
- 采取直连模式整合kafka
val kafkaDStream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream(
sc,
//数据本地性策略 Use this in most cases, it will consistently distribute partitions across all executors
LocationStrategies.PreferConsistent,
//指定要订阅的topic
ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
)
createDirectStream
源码
def createDirectStream[K, V](
ssc: StreamingContext,
locationStrategy: LocationStrategy,
consumerStrategy: ConsumerStrategy[K, V]
): InputDStream[ConsumerRecord[K, V]] = {
val ppc = new DefaultPerPartitionConfig(ssc.sparkContext.getConf)
createDirectStream[K, V