1、spark消费kafka数据
spark从topic的指定偏移量开始消费数据,指定后会覆盖参数设置中的配置 "auto.offset.reset" -> "earliest"
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "test",
"auto.offset.reset" -> "earliest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
//从topic的指定偏移量开始消费
val offsets = Map[TopicPartition, Long](
new TopicPartition("topic-test",0)->60,
new TopicPartition("topic-test",1)->20
)
val topics = Array("topic-test")
val dStream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent,Subscribe[String, String](topics, kafkaParams,offsets))
消费完后将偏移量记录在kafka的_consumer_offsets topic中
dStream.foreachRDD(rdd => {
val offsets: Array[OffsetRange] = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
//进行数据逻辑处理和保存输出
rdd.foreach(l => println(l.value()))
offsets.foreach(o => println(o.topic,o.partition,o.fromOffset,o.untilOffset))
//保存偏移量到kafka
dStream.asInstanceOf[CanCommitOffsets].commitAsync(offsets)
})
2、spark数据写入kafka
为了可以在executor中通过producer发送数据到kafka,自定义可序列化的producer类
class KafkaSink[K, V](prop: Properties) extends Serializable {
lazy val producer = new KafkaProducer[K, V](prop)
def send(topic: String, key: K, value: V): Future[RecordMetadata] =
producer.send(new ProducerRecord[K, V](topic, key, value))
def send(topic: String, value: V): Future[RecordMetadata] =
producer.send(new ProducerRecord[K, V](topic, value))
def send(message: ProducerRecord[K, V]): Future[RecordMetadata] =
producer.send(message)
}
将producer进行广播,在executor中就可以将数据写入kafka中
val prop = new Properties()
prop.setProperty("bootstrap.servers", "localhost:9092")
prop.setProperty("key.serializer", classOf[StringSerializer].getName)
prop.setProperty("value.serializer", classOf[StringSerializer].getName)
val producer: KafkaSink[String, String] = new KafkaSink[String, String](prop)
val bc= sc.broadcast(producer)
user_info.foreach(line => {
/*val cols: Array[String] = line.split("\\|")
val values = cols(1) + "," + cols(2) + "," + cols(3)
val message = new ProducerRecord[String,String]("topic-test",cols(0),values)*/
val message = new ProducerRecord[String,String]("yjs-test",null,line)
//println("key:" + message.key() + " value: " + message.value())
bc.value.send(message)
})