SparkStreaming快速入门程序----WordCount

最新推荐文章于 2024-01-01 16:34:18 发布

CatherineHuangTT

最新推荐文章于 2024-01-01 16:34:18 发布

阅读量306

点赞数 1

分类专栏： Spark学习随笔

Spark学习随笔专栏收录该内容

32 篇文章 0 订阅

订阅专栏

import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Created by Administrator on 2017/8/9.
  */
object WordCountTest {
  def main(args: Array[String]): Unit = {
    /**
      * 这个地方设置的线程数至少是2，因为一个线程用来接收数据
      * 另外一个线程是用来处理数据的。
      * 如果你只写了一个线程，也不报错，只不过光是接收数据，不处理数据。
      */
val conf = new SparkConf().setMaster("local[2]").setAppName("test")

//初始化一个StreamingContext
val ssc = new StreamingContext(conf,Seconds(1))

//选择控制台打印日志的类型
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)

    //通过监听一个端口得到一个DStream流    数据的输入
    val DStream: ReceiverInputDStream[String] = ssc.socketTextStream("hadoop1",9999)

    //数据的处理
    val wordCountDstream: DStream[(String, Int)] = DStream.flatMap(_.split(","))
      .map((_, 1))
      .reduceByKey((_ + _))

   //数据的输出
    wordCountDstream.print()  //把结果打印到控制台


    ssc.start()  //启动程序
    ssc.awaitTermination() //等待程序运行结束
    ssc.stop()  //释放资源

  }

}

运行程序的之前，先在被监听的主机上运行这样的语句：

yum install -y nc(在本地测试的时候不用做此操作)

nc -lk 9999

然后发送数据给SparkStreaming

注释：

After a context is defined, you have to do the following.

1.Define the input sources by creating input DStreams.

2.Define the streaming computations by applying transformation and output operations to DStreams.

3.Start receiving data and processing it using streamingContext.start().

4.Wait for the processing to be stopped (manually or due to any error) using streamingContext.awaitTermination().

5.The processing can be manually stopped using streamingContext.stop().

Points to remember:

1.Once a context has been started, no new streaming computations can be set up or added to it.

2.Once a context has been stopped, it cannot be restarted.

3.Only one StreamingContext can be active in a JVM at the same time.

4.stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.

5.A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.

CatherineHuangTT

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SparkStreaming快速入门程序----WordCount

import org.apache.log4j.{Level, Logger}import org.apache.spark.SparkConfimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Seconds, Streamin
复制链接

扫一扫

专栏目录