[spark-src] 1-overview

what is

  "Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spark 

 

  in despite of it's real a fact or not, i think certain key concepts/components to support these points of view:

a.use Resilient Distributed Datasets(RDD) program modeling largely differs from common ideas,eg. mapreduce.spark uses many optimized algorithms(e.g. iterative,localization etc) spread workload to across many workers in cluster.specially in reuse of data computation.

  RDD:A resilient distributed dataset (RDD) is a read-only col- lection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.[1]

 

b.uses memory as far as possible.most of the intermediate results from spark retains in memory other than disks,so it's  needles suffer from the io problem and serial-deserial cases.

  in fact we use many tools to do similar stuffs ,like memocache,redis..

c.emphasizes the parallism concept.

d.degrades the jvm supervior responsibilities.eg. use one executor to hold on certain tasks instead of one container per task in yarn.

 

architecture

  (the core component is  as a platform for other components)

 

 

usages of spark

1.iterative alogrithms.eg. machine learning,clustering..

2.interactive analystics. eg. query a ton of data loaded from disk to memory to reduce the latency of io

3.batch process

 

program language 

  most of the source code are writing with scala( i think many functions,ideas are inspirated from scala;),but u can also write with java,python in it

 

flex integrations

  many popular frameworks are supported by spark,e.g. hadoop,hbase,mesos etc

 

ref:

[1] some papers 

[spark-src]-source reading

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值