《Spark Streaming 有状态wordCount示例 (updateStateByKey的使用)》

Spark Streaming 有状态wordCount示例 (updateStateByKey的使用)

示例从一个wordcount开始,不同应用场景下的state是不同的,需要根据需求修改updateFunction。

数据接收自kafka topicA。从 Spark、hadoop、flink、hbase、kafka中随机抽取一个单词发送到 topicA

代码如下:

/**
 * Copyright(C) 2018 Hangzhou xianghu.wang Technology Co., Ltd. All rights reserved.
 */
package com.ccclubs.kafka;

import com.ccclubs.uitl.KafkaUtil;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

/**
 * @author xianghu.wang
 * @date: 2018-12-18 15:22
 * @des:
 */
public class KafkaWordProducer {
    private static final String TOPIC = "topicA";

    public static void main(String[] args) throws InterruptedException {
        KafkaProducer producer = KafkaUtil.getKafkaProucer();
        String[] sources = {"spark", "hadoop", "flink", "hbase", "kafka"};

        int wordIndex;

        while (true) {
            wordIndex = (int) (Math.random() * sources.length);

            ProducerRecord record = new ProducerRecord(TOPIC, sources[wordIndex]);
            System.out.println(record.value());
            producer.send(record);
            Thread.sleep(1000);
        }
    }
}

Spark Streaming接收单词,统计单词个数,并打印在控制台。

package com.ccclubs.streaming

import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @author xianghu.wang
  * @date: 2018-12-18 15:47 
  * @des: 有状态的wordcount
  */
object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    // 创建StreamingContext
    val sparkConf = new SparkConf().setMaster("local[*]").setAppName("SparkStreamingDemo")
    val ssc = new StreamingContext(sparkConf, Seconds(1))

    // 配置检查点目录
    ssc.checkpoint("./checkpoint")

    // kafka参数
    val kafkaParams = Map[String, Object](
      "bootstrap.servers" -> "zc01:9092",
      "key.deserializer" -> classOf[StringDeserializer],
      "value.deserializer" -> classOf[StringDeserializer],
      "group.id" -> "SparkStreamingDemo",
      "auto.offset.reset" -> "latest"
    )

    // kafka主题
    val topics = Array("topicA")

    // 从kafka创建DStream
    val stream = KafkaUtils.createDirectStream[String, String](
      ssc,
      PreferConsistent,
      Subscribe[String, String](topics, kafkaParams)
    )

    // stream中的每一条记录都是一个ConsumerRecord,
    // public ConsumerRecord(topic: String, partition: Int, offset: Long, key: K, value: V)
    val kvs = stream.map(record => (record.value, 1))
    val count = kvs.updateStateByKey[Int](updateFunction _)

    // 打印在控制台
    count.print()

    // 开始
    ssc.start()
    ssc.awaitTermination()
  }

  /**
    *
    * @param newValues 新值序列,其类型对应键值对中的值类型(这里是Int)
    * @param oldCount 之前统计的值
    * @return
    */
  def updateFunction(newValues: Seq[Int], oldCount: Option[Int]): Option[Int] = {
    val newCount = newValues.sum
    val previousCount = oldCount.getOrElse(0)
    Some(newCount + previousCount)
  }
}

运行结果:

注:转载请注明 出处

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值