Spark Streaming Demo练习

环境准备
  1. 版本
    scala版本:2.11.8
    spark版本:2.1.0
  2. pom.xml文件 添加依赖jar包
<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>

Spark Streaming wc练习

socket 统计WC nc -lk 9999 服务 的输入

  1. 代码
package com.imooc.spark.sparkStreaming

import org.apache.spark._
import org.apache.spark.streaming._ // not necessary since Spark 1.3

object NetworkWordCount {

  def main(args: Array[String]): Unit = {


    // Create a local StreamingContext with two working thread and batch interval of 1 second.
    // The master requires 2 cores to prevent a starvation scenario.
    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val ssc = new StreamingContext(conf, Seconds(5))
    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("192.168.52.130", 9999)
    //val lines = ssc.textFileStream("E:///dept.txt")
    // Split each line into words
    val words = lines.flatMap(_.split(",")) // not necessary since Spark 1.3
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    ssc.sparkContext.setLogLevel("ERROR")
    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()
    println(wordCounts)
    ssc.start() // Start the computation
    ssc.awaitTermination() // Wait for the computation to terminate
  }

}


  1. 演示结果:
[hadoop@hadoop001 conf]$ nc -lk 9999
hello,world
hello,world
hello,world,hello,world,hello,world


-------------------------------------------
Time: 1542353340000 ms
-------------------------------------------

-------------------------------------------
Time: 1542353345000 ms
-------------------------------------------
(hello,1)
(world,1)

-------------------------------------------
Time: 1542353350000 ms
-------------------------------------------
(hello,1)
(world,1)

-------------------------------------------
Time: 1542353355000 ms
-------------------------------------------

-------------------------------------------
Time: 1542353360000 ms
-------------------------------------------
(hello,3)
(world,3)

Basic Source

1. File Stream(本地文件读取)
  • 代码
object FileStreams {

  def main(args: Array[String]): Unit = {
    System.setProperty("hadoop.home.dir", "E:\\soft\\winutils\\hadoop-common-2.2.0-bin-master")
    val conf = new SparkConf().setMaster("local[4]").setAppName("FileStreams")

    val ssc = new StreamingContext(conf, Seconds(6))
    ssc.sparkContext.setLogLevel("ERROR")
    val DStream = ssc.textFileStream("file:///E:/testData/SparkStreaming")
    println(DStream)
    DStream.print()
    ssc.start()
    ssc.awaitTermination()
  }
  • 写入文件
  def main(args: Array[String]) {
    val writer = new PrintWriter(new File("E://testData/SparkStreaming/33.txt"))
    writer.write("哈喽啊,SparkStreamin,spark,java,spark,java")
    writer.close()
  }
  • 结果演示

在这里插入图片描述

特别注意:

1.Spark Streaming 读取本地文件的时候,要先启动Spark Streaming 去监听要读取的文件夹,然后使用上面提供的写入文件的方法写入一个新文件到监听的文件夹下

2.其实就是Spark Streaming textFileStream方法只能监听到新进来的文件;如果是提前新建的,复制进去或者是继续往文件下写入数据,他也不会监听.只有监听开始后写入监听文件夹下才能读取到

2.DataFrame and SQL 操作

  • 代码
package com.imooc.spark.sparkStreaming

import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.{Seconds, StreamingContext, Time}

/**
  * sparkStreaming DataFrame
  */
object SqlNetworkWordCount {

  def main(args: Array[String]) {
    // Create a local StreamingContext with two working thread and batch interval of 1 second.
    // The master requires 2 cores to prevent a starvation scenario.
    val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
    val ssc = new StreamingContext(conf, Seconds(5))
    //日志输出
    ssc.sparkContext.setLogLevel("ERROR")
    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("192.168.52.130", 9999)
    //val lines = ssc.textFileStream("E:///dept.txt")
    // Split each line into words
    val words = lines.flatMap(_.split(","))

    words.foreachRDD { (rdd: RDD[String], time: Time) =>
      // Get the singleton instance of SparkSession
      val spark = SparkSessionSingleton.getInstance(rdd.sparkContext.getConf)
      import spark.implicits._

      // Convert RDD[String] to RDD[case class] to DataFrame
      val wordsDataFrame = rdd.map(w => Record(w)).toDF()

      // Creates a temporary view using the DataFrame
      wordsDataFrame.createOrReplaceTempView("words")

      // Do word count on table using SQL and print it
      val wordCountsDataFrame =
        spark.sql("select word, count(*) as total from words group by word")
      println(s"========= $time =========")
      println(wordCountsDataFrame.show())
    }
    ssc.start()             // Start the computation
    ssc.awaitTermination()  // Wait for the computation to terminate
  }
}


/** Case class for converting RDD to DataFrame */
case class Record(word: String)


/** Lazily instantiated singleton instance of SparkSession */
object SparkSessionSingleton {

  @transient  private var instance: SparkSession = _

  def getInstance(sparkConf: SparkConf): SparkSession = {
    if (instance == null) {
      instance = SparkSession
        .builder
        .config(sparkConf)
        .getOrCreate()
    }
    instance
  }
}

  • 结果演示
[hadoop@hadoop001 conf]$ nc -lk 9999
hello,world
hello,world
hello,world,hello,world,hello,world
a,b,c,d,
e,a,c,a,





========= 1542353890000 ms =========
+----+-----+
|word|total|
+----+-----+
|   d|    1|
|   c|    1|
|   b|    1|
|   a|    1|
+----+-----+

()
========= 1542353895000 ms =========
+----+-----+
|word|total|
+----+-----+
|   e|    1|
|   c|    1|
|   a|    2|
+----+-----+

========= 1542353915000 ms =========

3.读取HDFS
  • 代码

参照

  1. File Stream(本地文件读取)
val DStream = ssc.textFileStream("hdfs://hadoop001:9000/user/hadoop/spark-streaming-test/")
  • 结果
    在这里插入图片描述
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值