spark-streaming 编程(二) word count单词计数统计

最新推荐文章于 2023-12-01 09:51:16 发布

12345677654321000000

最新推荐文章于 2023-12-01 09:51:16 发布

阅读量1.9k

点赞数

分类专栏： spark

本文链接：https://blog.csdn.net/zhoudetiankong/article/details/77484165

版权

spark 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

就那官方的例子来说明，代码基本上有注释

package com.lgh.sparkstreaming

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Created by Administrator on 2017/8/22.
  */
object NetworkWordCount {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: NetworkWordCount <hostname> <port>")
      System.exit(1)
    }

    //创建SparkConf对象，在这我指定master为local[2],
    // 本地模式方便测试，另外需要注意，本地模式下local的必须大于等于2，否则就无法正确运行
    //因为接收数据和处理数据需要两个线程。
    val sparkConf = new SparkConf().setAppName("NetworkWordCount").setMaster("local[2]");

    //批处理间隔，每10s，创建Streaming
    val ssc = new StreamingContext(sparkConf, Seconds(10))

    // Create a socket stream on target ip:port and count the
    // words in input stream of \n delimited text (eg. generated by 'nc')
    // Note that no duplication in storage level only for running locally.
    // Replication necessary in distributed scenario for fault tolerance.
    //构建数据源为socket，
    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)

    //transform操作，数据转换
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)

    //output操作，streaming中必须至少有一个output 操作
    wordCounts.print()

    ssc.start()
    ssc.awaitTermination()
  }
}