Spark Streaming基础学习【一】WordCount

最新推荐文章于 2022-05-24 22:02:55 发布

freefish_yzx

最新推荐文章于 2022-05-24 22:02:55 发布

阅读量1k

点赞数

分类专栏： Spark 文章标签： Spark Spark Streaming基础学习一WordCount Stream WorkCount

本文链接：https://blog.csdn.net/freefish_yzx/article/details/77648152

版权

Spark 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Spark Streaming进行WordCount的基础操作，包括新建Maven项目，导入依赖，编写代码，本地运行程序，并提供了减少日志的建议。

摘要由CSDN通过智能技术生成

1.新建maven项目

在opm.xml导入

  <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>

2.代码

package day05.d

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}

object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    //StreamingContext
    val conf=new SparkConf().setAppName("StreamingWordCount").setMaster("local[2]")
    val sc=new SparkContext(conf)
    val ssc=new StreamingContext(sc,Seconds(5))
    //接收数据
     val ds=ssc.socketTextStream("192.168.123.151",8888)
    //DStream是一个特殊的RDD
    val result=ds.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
    //打印结果
    result.print()
     ssc.start()
    //等待结束
    ssc.awaitTermination()
  }
}

在Linux输入：nc -lk 8888

如果没安装 nc

/usr/bin/yum install nc

然后就输入单词个数每隔5秒钟就统计一次

3.本地跑程序

4.减少日志

新建一个object

package day05.d

import org.apache.log4j.{Level, Logger}
import org.apache.spark.Logging


object LoggerLevels extends  Logging{
  def setStreamingLogLevels() {
    val log4jInitialized = Logger.getRootLogger.getAllAppenders.hasMoreElements
    if (!log4jInitialized) {
      logInfo("Setting log level to [WARN] for streaming example."
        + " To override add a custom log4j.properties to the classpath.")
      Logger.getRootLogger.setLevel(Level.WARN)
    }
  }
}

在上一个加入

LoggerLevels.setStreamingLogLevels()

也就是：

package day05.d

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}

object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    LoggerLevels.setStreamingLogLevels()//减少日志
    //StreamingContext
    val conf=new SparkConf().setAppName("StreamingWordCount").setMaster("local[2]")
    val sc=new SparkContext(conf)
    val ssc=new StreamingContext(sc,Seconds(5))
    //接收数据
     val ds=ssc.socketTextStream("192.168.123.151",8888)
    //DStream是一个特殊的RDD
    val result=ds.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
    //打印结果
    result.print()
     ssc.start()
    //等待结束
    ssc.awaitTermination()
  }
}

结果截图：