spark streaming python_如何使用Spark Streaming和Python从Kafka消费JSON记录？

最新推荐文章于 2024-05-10 22:51:01 发布

HHyo

最新推荐文章于 2024-05-10 22:51:01 发布

阅读量229

点赞数

文章标签： spark streaming python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35715190/article/details/114470030

版权

为什么Python不是Scala？！那么你的家庭练习就是把下面的代码重写成Python；-)As of Spark 2.1.1, out of these sources, Kafka, Kinesis and Flume are available in the Python API.

基本上，过程是：

使用spark-streaming-kafka-0-10_2.11库读取Kafka主题中的消息，如Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)使用KafkaUtils.createDirectStream中所述。在import org.apache.kafka.clients.consumer.ConsumerRecord

import org.apache.kafka.common.serialization.StringDeserializer

import org.apache.spark.streaming.kafka010._

import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent

import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

val kafkaParams = Map[String, Object](

"bootstrap.servers" -> "localhost:9092,anotherhost:9092",

"key.deserializer" -> classOf[StringDeserializer],

"value.deserializer" -> classOf[StringDeserializer],

"group.id" -> "use_a_separate_group_id_for_each_stream",

"auto.offset.reset" -> "latest",

"enable.auto.commit" -> (false: java.lang.Boolean)

)

val topics = Array("topicA", "topicB")

val stream = KafkaUtils.createDirectStream[String, String](

streamingContext,

PreferConsistent,

Subscribe[String, String](topics, kafkaParams)

)

使用map运算符将ConsumerRecords复制到值，这样就不会遇到序列化问题。在

^{pr2}$

如果不发送密钥，只要record.value就足够了。在stream.map(record => record.value)

将字符串消息转换为JSON一旦有了这些值，就可以使用from_json函数：from_json(e: Column, schema: StructType) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

代码如下所示：...foreach { rdd =>

messagesRDD.toDF.

withColumn("json", from_json('value, jsonSchema)).

select("json.*").show(false)

}

完成！

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark streaming python_如何使用Spark Streaming和Python从Kafka消费JSON记录？

为什么Python不是Scala？！那么你的家庭练习就是把下面的代码重写成Python；-)As of Spark 2.1.1, out of these sources, Kafka, Kinesis and Flume are available in the Python API.基本上，过程是：使用spark-streaming-kafka-0-10_2.11库读取Kafka主题中的消息，...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。