val stream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topic, kafkaParam)) stream.foreachRDD(rdd => { rdd.foreachPartition(partitionOfRecords => { println("打印分区的长度:"+partitionOfRecords.length) partitionOfRecords.foreach(record => { val records = record.value().split("\t") }) }) })
如代码所示,SparkStream消费Kafka时,在遍历分区(partitionOfRecords ),如果直接打印分区(partitionOfRecords )长度
其内部的数据同时会这被消费掉。以至于把foreach时遍历不出来。这是一个很奇怪的现像。值得大家引起重视