spark-streaming

一、创建一个SparkStreaming

1、StreamingContext

val ssc = new StreamingContext(sparkConf, Seconds(10))
  • 是流计算功能的主要入口
  • 会在底层创建SparkContext,用来处理数据
  • 构造函数接受用来指定多长时间处理一次新数据的批次间隔(batch interval),单位s

2、socketTextStream

val lines = ssc.socketTextStream("localhost", 9999) 

创建出基于本地9999端口上收到的文本数据DStream

3、start()

  • 只要设定好了要进行的计算,系统收到数据时就算就会开始
  • 要开始接受数据,必须显示调用StreamingContext的start()方法。这样Spark Streaming就会开始把Spark作业不断交给下面的Spark Context去调度执行

4、awaitTermination()

执行会在另一个线程中进行,所以需要调用awaitTermination()来等待流计算完成,以防止应用退出

二、运行

1.运行任务

1>标准输出 2>错误输出

bash wc_local.sh 1>1.log 2>2.log

2.监控日志

tail -f 1.log

3.打开端口

nc -l 9999

4.测试wordCount

package com.albert.streaming.test

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}

object WordCount {
  def main(args: Array[String]) {
	if (args.length < 2) {
	  System.err.println("Usage: wordCount <hostname> <port>")
	  System.exit(1)
	}

	val sparkConf = new SparkConf().setAppName("StreamingWordCount")
	val streamCtx = new StreamingContext(sparkConf, Seconds(5))

	val lines = streamCtx.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
	val words = lines.flatMap(_.split(" "))
	val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
	wordCounts.print()
	wordCounts.saveAsTextFiles("hdfs://master:9000/stream_out", "doc")
	streamCtx.start()
	streamCtx.awaitTermination()

  }

}

4.1local

wc_local.sh

/usr/local/src/spark-1.6.0-bin-hadoop2.6/bin/spark-submit \
	--master local[2] \
	--class com.albert.streaming.test.WordCount /usr/local/src/learn/albert/25_spark_streaming/streaming-1.0-SNAPSHOT.jar \
	master \
	9999

测试wordCount

4.2standalone

wc_standalone.sh

/usr/local/src/spark-1.6.0-bin-hadoop2.6/bin/spark-submit \
	--master spark://master:7077 \
	--num-executors 2 \
	--executor-memory 1g \
	--executor-cores 2 \
	--driver-memory 1g \
	--class com.albert.streaming.test.WordCount /usr/local/src/learn/albert/25_spark_streaming/streaming-1.0-SNAPSHOT.jar \
	master \
	9999

在这里插入图片描述

4.3cluster

wc_cluster.sh

/usr/local/src/spark-1.6.0-bin-hadoop2.6/bin/spark-submit \
	--master yarn-cluster \
	--num-executors 2 \
	--executor-memory 1g \
	--executor-cores 2 \
	--driver-memory 1g \
	--class com.albert.streaming.test.WordCount /usr/local/src/learn/albert/25_spark_streaming/streaming-1.0-SNAPSHOT.jar \
	master \
	9999

在这里插入图片描述

杀掉任务
yarn application -kill application_1590989536331_0001

5.测试WordCountWithState

会保存历史的信息

package com.albert.streaming.test

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}

object WordCountWithState {
  def updateFunction(currentValues: Seq[Int], preValues: Option[Int]): Option[Int] = {
	val current = currentValues.sum
	val pre = preValues.getOrElse(0)
	Some(current + pre)
  }

  def main(args: Array[String]) {
	if (args.length < 2) {
	  System.err.println("Usage:WordCountWithState<hostname> <port>")
	  System.exit(1)
	}

	val sparkConf = new SparkConf().setAppName("StreamingWordCountWithState")
	val streamCtx = new StreamingContext(sparkConf, Seconds(5))
	streamCtx.checkpoint("hdfs://master:9000/hdfs_checkpoint")

	val lines = streamCtx.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
	val words = lines.flatMap(_.split(" "))
	val wordCounts = words.map(x => (x, 1)).updateStateByKey(updateFunction _)

	wordCounts.print()
	wordCounts.saveAsTextFiles("hdfs://master:9000/stream_state_out", "doc")
	streamCtx.start()
	streamCtx.awaitTermination()
  }

}

6.测试windowTest

只保留某个时间点往前的信息,比如只保存5秒钟,9:00:00的时候,此时的数据是8:59:55-9:00:00这个时间段内的数据,9:00:30的时候,此时的数据是9:00:25-9:00:30这个时间段内的数据,

package com.albert.streaming.test

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}

object WindowTest {
  def main(args: Array[String])  {
 	val sparkConf = new SparkConf().setAppName("StreamingWindowTest")
	val streamCtx = new StreamingContext(sparkConf, Seconds(10))
	streamCtx.checkpoint("hdfs://master:9000/hdfs_checkpoint")

	val lines = streamCtx.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)
	val words = lines.flatMap(_.split(" "))
	val wordCounts = words.map(x => (x, 1)).reduceByKeyAndWindow((v1: Int, v2: Int) => v1 + v2, Seconds(30), Seconds(10))

	wordCounts.print()
	streamCtx.start()
 streamCtx.awaitTermination()

 }

}

只测本地,其他测试同4

windowTest

附pom.xml

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.10</artifactId>
    <version>1.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.10</artifactId>
    <version>1.6.0</version>
</dependency>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值