Spark调优指南(三)-调整数据结构

最新推荐文章于 2024-04-14 03:28:46 发布

潇洒-人生

最新推荐文章于 2024-04-14 03:28:46 发布

阅读量177

点赞数

分类专栏：大数据 spark

本文链接：https://blog.csdn.net/qq_35744460/article/details/90340237

版权

大数据同时被 2 个专栏收录

45 篇文章 0 订阅

订阅专栏

spark

17 篇文章 0 订阅

订阅专栏

Tuning Data Structures
调整数据结构
The first way to reduce memory consumption is to avoid the Java features that add overhead,
减少内存消耗的第一种方法是避免增加开销的Java功能，
such as pointer-based data structures and wrapper objects. 
例如基于指针的数据结构和包装器对象
There are several ways to do this:
做这件事有很多种方法：
1 Design your data structures to prefer arrays of objects, and primitive types, 
优先选择对象数组和基本类型
instead of the standard Java or Scala collection classes (e.g. HashMap).
而不是标准的Java或Scala集合类（例如HashMap）
The fastutil library provides convenient collection classes for primitive types that are compatible with the Java standard library.
该fastutil 库提供方便的集合类基本类型是与Java标准库兼容。
2 Avoid nested structures with a lot of small objects and pointers when possible.
尽可能避免使用包含大量小对象和指针的嵌套结构
Consider using numeric IDs or enumeration objects instead of strings for keys.
考虑使用数字ID或枚举对象而不是键的字符串
If you have less than 32 GB of RAM, set the JVM flag -XX:+UseCompressedOops to make pointers be four bytes instead of eight.
如果RAM少于32 GB，请设置JVM标志-XX:+UseCompressedOops以使指针为四个字节而不是八个字节
You can add these options in spark-env.sh.
您可以添加这些选项 spark-env.sh。

通过 jinfo -flags pid 可以查看详细信息