内存调优

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/qq_30130043/article/details/80321827

使用SizeEstimator.estimate(RDD),可以实验出占多少内存,也可以知道广播出去的空间


2)优化数据结构(Turning Data structures) ------ 非着重点

The first way to reduce memory consumption is to avoid the Java features that add overhead, such as pointer-based data structures and wrapper objects. There are several ways to do this:

    减少内存的消耗可以从避免java的特性开始,比如指针,包装类型等,如下:

  1. Design your data structures to prefer arrays of objects, and primitive types, instead of the standard Java or Scala collection classes (e.g. HashMap). The fastutil library provides convenient collection classes for primitive types that are compatible with the Java standard library.

            设计数据结构,如果用到hashmap,建议使用数组实现。也可以使用fastutil(工具栏)提供的类型,比原生的好

          2.Avoid nested structures with a lot of small objects and pointers when pos

               避免去使用内嵌的对象,和指针的。(即包装和new出来的)

            3.Consider using numeric IDs or enumeration objects instead of strings for keys.

            考虑使用数值的ID或者枚举类型,代替string的key

           4.If you have less than 32 GB of RAM, set the JVM flag -XX:+UseCompressedOops to make pointers be four bytes instead of eight. You can add these options in spark-env.sh.

            如果内存少于32个G,建议使用上述JVM参数,可以减少指针消耗,可以设置在spark.env.sh里


3)生产的重点,还是从数据倾斜为着重点

展开阅读全文

没有更多推荐了,返回首页