SparkStreaming快速入门程序----WordCount

import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * Created by Administrator on 2017/8/9.
  */
object WordCountTest {
  def main(args: Array[String]): Unit = {
    /**
      * 这个地方设置的线程数至少是2,因为一个线程用来接收数据
      * 另外一个线程是用来处理数据的。
      * 如果你只写了一个线程,也不报错,只不过光是接收数据,不处理数据。
      */
val conf = new SparkConf().setMaster("local[2]").setAppName("test")

//初始化一个StreamingContext
val ssc = new StreamingContext(conf,Seconds(1))

//选择控制台打印日志的类型
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)

    //通过监听一个端口得到一个DStream流    数据的输入
    val DStream: ReceiverInputDStream[String] = ssc.socketTextStream("hadoop1",9999)

    //数据的处理
    val wordCountDstream: DStream[(String, Int)] = DStream.flatMap(_.split(","))
      .map((_, 1))
      .reduceByKey((_ + _))

   //数据的输出
    wordCountDstream.print()  //把结果打印到控制台


    ssc.start()  //启动程序
    ssc.awaitTermination() //等待程序运行结束
    ssc.stop()  //释放资源

  }

}


运行程序的之前,先在被监听的主机上运行这样的语句:

yum install -y nc(在本地测试的时候不用做此操作)

nc  -lk  9999


然后发送数据给SparkStreaming

注释:

After a context is defined, you have to do the following.

1.Define the input sources by creating input DStreams.

2.Define the streaming computations by applying transformation and output operations to DStreams.

3.Start receiving data and processing it using streamingContext.start().

4.Wait for the processing to be stopped (manually or due to any error) using streamingContext.awaitTermination().

5.The processing can be manually stopped using streamingContext.stop().

Points to remember:

1.Once a context has been started, no new streaming computations can be set up or added to it.

2.Once a context has been stopped, it cannot be restarted.

3.Only one StreamingContext can be active in a JVM at the same time.

4.stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.

5.A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值