Spark调优指南(二)-内存调优

Spark调优指南(二)-内存调优

官网介绍

Memory Tuning

There are three considerations in tuning memory usage: the amount of memory used by your objects (you may want your entire dataset to fit in memory), the cost of accessing those objects, and the overhead of garbage collection (if you have high turnover in terms of objects).

By default, Java objects are fast to access, but can easily consume a factor of 2-5x more space than the “raw” data inside their fields. This is due to several reasons:

Each distinct Java object has an “object header”, which is about 16 bytes and contains information such as a pointer to its class. For an object with very little data in it (say one Int field), this can be bigger than the data.
Java Strings have about 40 bytes of overhead over the raw string data (since they store it in an array of Chars and keep extra data such as the length), and store each character as two bytes due to String’s internal usage of UTF-16 encoding. Thus a 10-character string can easily consume 60 bytes.
Common collection classes, such as HashMap and LinkedList, use linked data structures, where there is a “wrapper” object for each entry (e.g. Map.Entry). This object not only has a header, but also pointers (typically 8 bytes each) to the next object in the list.
Collections of primitive types often store them as “boxed” objects such as java.lang.Integer.
This section will start with an overview of memory management in Spark, then discuss specific strategies the user can take to make more efficient use of memory in his/her application. In particular, we will describe how to determine the memory usage of your objects, and how to improve it – either by changing your data structures, or by storing data in a serialized format. We will then cover tuning Spark’s cache size and the Java garbage collector.

1. 对象的使用内存量
2. 访问这些对象的成本
3. gc的开销

一般默认情况下,Java对象访问速度很快,但是可能比原始数据多占用2-5 倍的空间
原因:
1.每个对象都有一个对象头一般是16b
2.Java String 一般占 40b 的开销
3.集合类 还有引用一般8b 

如何使用内存策略用过修改数据结构优化内存使用然后覆盖Spark缓存,和gc
  • java 对象占内存大小
    java对象 包括 对象头 + 实例数据 对于数组类型还有 长度信息 三部分
    一般64位情况下默认开启压缩的-XX:+UseCompressedOops
    通过SizeEstimator.estimate可以查看内存大小
    下面举了几个例子
    //默认-XX:+UseCompressedOops
      println("String:"+SizeEstimator.estimate(new String()))//String:40
      println("HashMap:"+SizeEstimator.estimate(new util.HashMap()))//HashMap:48
      println("TreeMap:"+SizeEstimator.estimate(new util.TreeMap()))//TreeMap:48
      println("Array:"+SizeEstimator.estimate(Array()))//Array:16
      println("Object:"+SizeEstimator.estimate(new Object()))//Object:16
      //默认-XX:+UseCompressedOops
      println("String:"+SizeEstimator.estimate(new String()))//String:56
      println("HashMap:"+SizeEstimator.estimate(new util.HashMap()))//HashMap:64
      println("TreeMap:"+SizeEstimator.estimate(new util.TreeMap()))//TreeMa80
      println("Array:"+SizeEstimator.estimate(Array()))//Array:24
      println("Object:"+SizeEstimator.estimate(new Object()))//Object:16
    

了解了java对象占用与内存的大小后,在使用数据结构是,尽量不使用包装类这种数据类型,调整我们的数据结构比如使用数组将减少内存占用

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值