Spark Streaming实时流处理实战笔记七

Spark Streaming核心概念

核心概念

核心概念之StreamingContext

在IDEA中 搜索StreamingContext.scala

def this(sparkContext: SparkContext, batchDuration: Duration) = {
this(sparkContext, null, batchDuration)
}

def this(conf: SparkConf, batchDuration: Duration) = {
this(StreamingContext.createNewSparkContext(conf), null, batchDuration)
}

batch interval可以根据你的应用程序的需求的延迟要求以及集群可用的资源情况来设置

After a context is defined, you have to do the following.

  1. Define the input sources by creating input DStreams.
  2. Define the streaming computations by applying transformation and
    output operations to DStreams.
  3. Start receiving data and processing it using
    streamingContext.start().
  4. Wait for the processing to be stopped (manually or due to any error)
    using streamingContext.awaitTermination().
  5. The processing can be manually stopped using
    streamingContext.stop().

一旦StreamingContext定义好之后,就可以做一些事情

Discretized Streams (DStreams)

Internally, a DStream is represented by a continuous series of RDDs
Each RDD in a DStream contains data from a certain interval

对DStream操作算子,比如map/flatMap,其实底层会被翻译为对DStream中的每个RDD都做相同因为一个DStream是由不同批次的RDD所构成的。
在这里插入图片描述

Input DStreams and Receivers

Input DStreams are DStreams representing the stream of input data received from streaming sources.

Every input DStream (except file stream, discussed later in this section) is associated with a Receiver (Scala doc, Java doc) object which
receives the data from a source and stores it
in Spark’s memory for processing.

Transformations

Output Operations

案例实战

案例一:Spark Streaming处理socket数据

报错 :java.lang.NoClassDefFoundError: net/jpountz/util/SafeUtils

百度查找maven repository
在这里插入图片描述
在这里插入图片描述

案例二:Spark Streaming处理HDFS文件数据

在这里插入图片描述

文件系统对接的注意事项(官网查询):

All files must be in the same data format.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值