【互动问答分享】第6期决胜云计算大数据时代Spark亚太研究院公益大讲堂

最新推荐文章于 2014-10-31 15:00:38 发布

Spark亚太研究院

最新推荐文章于 2014-10-31 15:00:38 发布

阅读量721

点赞数

分类专栏： Spark观点文章 Spark书籍连载 spark互动问答

本文链接：https://blog.csdn.net/wwttz1974/article/details/38370341

版权

Spark书籍连载同时被 3 个专栏收录

31 篇文章 0 订阅

订阅专栏

spark互动问答

13 篇文章 0 订阅

订阅专栏

Spark观点文章

3 篇文章 0 订阅

订阅专栏

“决胜云计算大数据时代”

Spark亚太研究院100期公益大讲堂【第6期互动问答分享】

Q1：spark streaming 可以不同数据流 join吗？

Spark Streaming不同的数据流可以进行join操作；

Spark Streaming is an extension of the coreSpark API that allows enables high-throughput, fault-tolerant stream processingof live data streams. Data can be ingested from many sources like Kafka, Flume,Twitter, ZeroMQ or plain old TCP sockets and be processed using complexalgorithms expressed with high-level functions like map, reduce, join and window

join(otherStream, [numTasks])：When called on twoDStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairswith all pairs of elements for each key.

Q2：flume 与 spark streaming 适合集群模式吗？

Flume与Spark Streaming是为集群而生的；

For input streams that receive data over the network (suchas, Kafka, Flume, sockets, etc.), the default persistence level is set toreplicate the data to two nodes for fault-tolerance.

Using any input source that receives datathrough a network - Fornetwork-based data sources like Kafka and Flume, the received input data isreplicated in memory between nodes of the cluster (default replication factoris 2).

Q3：spark有缺点嘛？

Spark的核心缺点在于对内存的占用比较大；

在以前的版本中Spark对数据的处理主要的是粗粒度的，难以进行精细的控制；

后来加入Fair模式后可以进行细粒度的处理；

Q4：spark streming现在有生产使用吗？

Spark Streaming非常易于在生产环境下使用；

无需部署，只需安装好Spark，，就按照好了Spark Streaming；

国内像皮皮网等都在使用Spark Streaming；

Spark亚太研究院

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【互动问答分享】第6期决胜云计算大数据时代Spark亚太研究院公益大讲堂

“决胜云计算大数据时代”Spark亚太研究院100期公益大讲堂【第6期互动问答分享】 Q1：sparkstreaming可以不同数据流 join吗？ Spark Streaming不同的数据流可以进行join操作； Spark Streaming is an extension of the coreSpark API that allows e
复制链接

扫一扫