SparkCore(16):Spark内存管理机制1.6之前和1.6+

68 篇文章 0 订阅
18 篇文章 0 订阅

一、Spark1.6之前(固定的值)

1.架构图

2.具体分配

Spark应用中代码使用内存:你编写的程序中使用到的内存=>20%
Spark数据缓存的时候用到的内存:60% => spark.storage.memoryFraction
Spark shuffle过程中使用到的内存:20% => spark.shuffle.memoryFraction

3.官网

spark.shuffle.memoryFraction0.2(deprecated) This is read only if spark.memory.useLegacyMode is enabled. Fraction of Java heap to use for aggregation and cogroups during shuffles. At any given time, the collective size of all in-memory maps used for shuffles is bounded by this limit, beyond which the contents will begin to spill to disk. If spills are often, consider increasing this value at the expense ofspark.storage.memoryFraction.
spark.storage.memoryFraction0.6(deprecated) This is read only if spark.memory.useLegacyMode is enabled. Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old" generation of objects in the JVM, which by default is given 0.6 of the heap, but you can increase it if you configure your own old generation size.
spark.storage.unrollFraction0.2(deprecated) This is read only if spark.memory.useLegacyMode is enabled. Fraction of spark.storage.memoryFraction to use for unrolling blocks in memory. This is dynamically allocated by dropping existing blocks when there is not enough free storage space to unroll the new block in its entirety.

二、Spark1.6+

1.架构图

2.具体分配

(1)Reserved Memory: 固定300M,不能进行修改
          作用:加载class的相对比较固定的对象以及计算最小Spark的Executor内存=1.5 * Reserved Memory = 450M
  所以单个Executor内存设置为500M是不会有问题的,但是当你设置成小于450M就会报错
  
(2)User Memory: 用户代码中使用到的内存, 默认占比:1 - spark.memory.fraction
    【这个1是指除去Reserved Memory之后的部分,实际应当为1-300M - spark.memory.fraction】
    
(3)Spark Memory: Spark应用执行过程中进行数据缓存和shuffle操作使用到的内存
    spark.memory.fraction:0.75
    1)总括:缓存(Storage Memory)和shuffle(Execution Memory)的内存分配是动态的
    2)缓存spark.memory.storageFraction:0.5 ==> Storage最少固定占用的内存大小比例
            -a. 如果Storage Memory和Execution Memory都是空的(都有容量)
                  如果有数据需要缓存,storage会占用execution部分的空余内存
                  同理,如果有执行过程内存需要,execution也会占用storage部分的空余内存
            -b. 如果storage memory满了,execution memory有空余
                  如果有数据缓存操作,storage会占用execution部分的空余内存
                  如果有执行过程内存需要,execution操作会占用storage部分的内存,会将storage部分存储的数据进行删除操作
            -c. 如果storage memory有空余,execution memory满了
                  如果数据有缓存操作,不能占用execution部分的内存
                  如果有执行过程内存需要,execution操作会占用storage部分的内存
                  备注:execution过程中使用到的内存是不允许进行删除操作的,storage的数据可以进行删除操作

3.实例

   默认1G
   Reserved Memory:300M
   Spark Memory: ( 1G - 300M) * 0.75 = 543M
        storage memory最小: 500*0.5=271M
   User Memory: 1G - 300M - 543M = 181M
   官方:http://spark.apache.org/docs/1.6.2/configuration.html#memory-management
           【自己:】
           参考:http://spark.apache.org/docs/2.1.0/configuration.html#memory-management

4.官网

spark.memory.fraction0.6Fraction of (heap space - 300MB) used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records. Leaving this at the default value is recommended. For more detail, including important information about correctly tuning JVM garbage collection when increasing this value, see this description.
spark.memory.storageFraction0.5Amount of storage memory immune to eviction, expressed as a fraction of the size of the region set aside by s​park.memory.fraction. The higher this is, the less working memory may be available to execution and tasks may spill to disk more often. Leaving this at the default value is recommended. For more detail, see this description

5.优化建议

    优化建议:如果spark应用缓存比较多,shuffle比较少,调高缓存的内存占比;反之亦然
 

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值