SparkStreaming之Helloworld

最新推荐文章于 2019-10-21 15:29:31 发布

万磁王0

最新推荐文章于 2019-10-21 15:29:31 发布

阅读量749

点赞数

分类专栏： spark 文章标签： spark streaming

本文链接：https://blog.csdn.net/meijie770342/article/details/79047418

版权

spark 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

首先，我们将Spark Streaming类的名称和StreamingContext的一些隐式转换导入到我们的环境中，以便将有用的方法添加到我们需要的其他类（如DStream）中。StreamingContext是所有流媒体功能的主要入口点。我们创建一个具有两个执行线程的本地StreamingContext，批处理间隔为10秒。

val sparkConf = new SparkConf().setAppName("NetworkWordCount").setMaster("local[4]")
val ssc = new StreamingContext(sparkConf, Seconds(10)) //每隔10秒统计一次字符总数

使用这个上下文，我们可以创建一个DStream，表示来自TCP源的流数据，指定为主机名（例如localhost）和端口（例如9998）。

    //创建珍一个DStream，连接master:9998
    val lines = ssc.socketTextStream("127.0.0.1",9998, StorageLevel.MEMORY_AND_DISK_SER)

该linesDStream表示将从数据服务器接收的数据流。此DStream中的每条记录都是一行文本。接下来，我们要把空格字符分割成单词。

val words = lines.flatMap(_.split(" "))

flatMap是一对多DStream操作，它通过从源DStream中的每个记录生成多个新记录来创建一个新的DStream。在这种情况下，每行将被分成多个单词，单词流被表示为 wordsDStream。接下来，我们要计算这些单词。

 val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()

Spark Streaming仅仅准备好了它要执行的计算，实际上并没有真正开始执行。在这些转换操作准备好之后，要真正执行计算，需要调用如下的方法

ssc.start() //开始计算
    ssc.awaitTermination()//通过手动终止计算，否则一直运行下去

下面在sorcket 9998端口写入数据：

package streaming.helloworld

import java.io.PrintWriter
import java.net.ServerSocket

/**
  * Created by Administrator on 2017/5/26.
  */
object GenerateChar {
  def generateContext(index : Int) : String = {
    import scala.collection.mutable.ListBuffer
    val charList = ListBuffer[Char]()
    for(i <- 65 to 90)
      charList += i.toChar
    val charArray = charList.toArray
    charArray(index).toString
  }
  def index = {
    import java.util.Random
    val rdm = new Random
    rdm.nextInt(7)
  }
  def main(args: Array[String]) {
    val listener = new ServerSocket(9998)
    while(true){
      val socket = listener.accept()
      new Thread(){
        override def run() = {
          println("Got client connected from :"+ socket.getInetAddress)
          val out = new PrintWriter(socket.getOutputStream,true)
          while(true){
            Thread.sleep(500)
            val context = generateContext(index)  //产生的字符是字母表的前七个随机字母
            println(context)
            out.write(context + '\n')
            out.flush()
          }
          socket.close()
        }
      }.start()
    }
  }

}

NetworkWordCount.scala的源码：

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// scalastyle:off println
package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel


object NetworkWordCount {
  def main(args: Array[String]) {

    StreamingExamples.setStreamingLogLevels()

    //创建一个本地的StreamingContext，含2个工作线程
    val sparkConf = new SparkConf().setAppName("NetworkWordCount").setMaster("local[4]")
    val ssc = new StreamingContext(sparkConf, Seconds(10)) //每隔10秒统计一次字符总数

    //创建珍一个DStream，连接master:9998
    val lines = ssc.socketTextStream("127.0.0.1",9998, StorageLevel.MEMORY_AND_DISK_SER)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()
    ssc.start() //开始计算
    ssc.awaitTermination()//通过手动终止计算，否则一直运行下去
  }
}
// scalastyle:on println

万磁王0

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SparkStreaming之Helloworld

首先，我们将Spark Streaming类的名称和StreamingContext的一些隐式转换导入到我们的环境中，以便将有用的方法添加到我们需要的其他类（如DStream）中。StreamingContext是所有流媒体功能的主要入口点。我们创建一个具有两个执行线程的本地StreamingContext，批处理间隔为10秒。val sparkConf = new SparkConf()
复制链接

扫一扫