spark streaming
zhifeng687
这个作者很懒,什么都没留下…
展开
-
Spark Streaming运行流程及源码解析
源码中的一些类这里先列举一些源码中的类,大家先预热一下。StreamingContext:这是Spark Streaming程序的入口,提供了运行时上下文环境DStream:是RDD在Spark Streaming中的实现,是连续的RDD(相同类型)序列,表示连续的数据流JobScheduler:生成和调度jobDStreamGraph:保存DStream之间的依赖关系JobGenerator:根据DStream依赖生成jobReceiverTracker:Driver端用于管理转载 2017-01-15 09:56:37 · 1017 阅读 · 0 评论 -
spark-streaming源码分析之checkpoint机制
概述StreamingContext#checkpoint()方法设置checkpointDir目录。存储系统必须是类似hdfs的容错系统。 /** * Set the context to periodically checkpoint the DStream operations for driver * fault-tolerance. * @param ...原创 2016-04-07 09:23:48 · 702 阅读 · 0 评论 -
checkpointing and fault tolerance in spark streaming
Checkpointing is the main mechanism that needs to be set up for fault tolerance in Spark Streaming. It allows Spark Streaming to periodically save data about the application to a reliable storage syst...翻译 2017-02-03 14:55:31 · 287 阅读 · 0 评论 -
Spark Streaming Checkpoint
1. ObjectiveThis document aims at a Spark Streaming Checkpoint, we will start with what is a streaming checkpoint, how streaming checkpoint helps to achieve fault tolerance. There are two types of s...翻译 2017-02-02 17:18:00 · 322 阅读 · 0 评论 -
Spark DStream: Abstraction of Spark Streaming
1. ObjectiveSpark DStream (Discretized Stream) is the basic abstraction of Spark Streaming. In this blog, we will learn the concept of DStream in Spark, we will learn what is DStream, operations of ...翻译 2017-02-02 17:12:47 · 538 阅读 · 0 评论 -
Stateful Transformations in Spark Streaming
1. ObjectiveAs we know, there are various modules available inApache Spark. Each module is serving different purposes,streaming APIis one of its powerful modules. It provides power to the develop...翻译 2017-02-02 16:56:08 · 1088 阅读 · 0 评论 -
Faster Stateful Stream Processing in Apache Spark Streaming
Many complex stream processing pipelines must maintain state across a period of time. For example, if you are interested in understanding user behavior on your website in real-time, you will have to m...翻译 2017-02-02 16:39:09 · 2288 阅读 · 0 评论 -
Spark Streaming Window Operations
1. ObjectiveSpark streaming leverages advantage of windowed computations inApache Spark. It offers to apply transformations over a sliding window of data. In this article, we will learn the whole c...翻译 2018-02-09 18:55:14 · 281 阅读 · 0 评论 -
Spark streaming接收Kafka数据的2种方式
前言在WeTest舆情项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端,我们利用了spark streaming从kafka中不断拉取数据进行词频统计。本文首先对spark streaming嵌入kafka的方式进行归纳总结,之后简单阐述Spark streaming+kafka 在舆情项目中的应用,最后将自...转载 2018-01-28 22:42:37 · 3537 阅读 · 0 评论 -
Spark Streaming高容错机制
Real-time stream processing systems must be operational 24/7, which requires them to recover from all kinds of failures in the system. Since its beginning,Apache Spark Streaminghas included support f...翻译 2018-02-06 17:03:34 · 391 阅读 · 0 评论 -
spark streaming源码分析之JobScheduler
摘要:JobScheduler是SparkStreaming整个调度的核心,其地位相当于Spark Core上的调度中心中的DAGScheduler! 一、JobScheduler内幕实现问:JobScheduler是在什么地方生成的?答:JobScheduler是在StreamingContext实例化时产生的,从StreamingContext的源码第1...转载 2018-02-09 10:49:59 · 320 阅读 · 0 评论 -
深入理解Spark Streaming流量控制及反压机制
目录流量控制简介 Spark Streaming流控基本设置 Spark Streaming反压机制的具体实现 动态流量控制器 基于PID机制的速率估算器 通过RPC发布流量阈值 借助Guava令牌桶完成流量控制 The End流量控制简介在流式处理系统中,流量控制(rate control/rate limit)是一个非常重要的话题。对系统进行流控,主要目的是为了...转载 2018-02-09 18:48:14 · 7503 阅读 · 2 评论