本文尝试分析某大型大数据解决方案公司企业级hadoop源代码,班门弄斧。
本篇文章的重点为HashMap vs TreeMap,LightWeightHashSet;
重点在于数据结构的访问速度和内存占用。
修改点2
Index: org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java
===================================================================
--- org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java (revision 37)
+++ org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java (revision 42)
@@ -22,6 +22,7 @@
import java.util.ArrayList;
import java.util.Calendar;
import java.util.GregorianCalendar;
+import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
@@ -47,7 +48,8 @@
class InvalidateBlocks {
/** Mapping: DatanodeInfo -> Collection of Blocks */
private final Map<DatanodeInfo, LightWeightHashSet<Block>> node2blocks =
- new TreeMap<DatanodeInfo, LightWeightHashSet<Block>>();
+ // new TreeMap<DatanodeInfo, LightWeightHashSet<Block>>();
+ new HashMap<DatanodeInfo, LightWeightHashSet<Block>>();
/** The total number of blocks in the map. */
private long numBlocks = 0L;
Index: org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
===================================================================
--- org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java (revision 37)
+++ org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java (revision 42)
@@ -46,9 +46,9 @@
CORRUPTION_REPORTED // client or datanode reported the corruption
}
- private final SortedMap<Block, Map<DatanodeDescriptor, Reason>> corruptReplicasMap =
- new TreeMap<Block, Map<DatanodeDescriptor, Reason>>();
-
+ private final HashMap<Block, Map<DatanodeDescriptor, Reason>> corruptReplicasMap =
+ // new TreeMap<Block, Map<DatanodeDescriptor, Reason>>();
+ new HashMap<Block, Map<DatanodeDescriptor, Reason>>();
/**
* Mark the block belonging to datanode as corrupt.
*
代码变更主要涉及InvalidateBlocks,CorruptReplicasMap两个类,
InvalidateBlocks主要是维护失效块儿的类,CorruptReplicasMap主要是维护corrupt块的类。
所做的修改主要是将TreeMap替换为HashMap,接下来我们就分析下这样做的好处。
一。TreeMap - HashMap 对比
x | TreeMap | HashMap |
---|---|---|
实现 | TreeMap基于红黑树(一种自平衡二叉查找树)实现的, | 是基于散列表实现的 |
时间复杂度 | 时间复杂度平均能达到O(log n) | 时间复杂度平均能达到O(1) |
排序 | 已排序 | 未排序 |
线程安全 | 非线程安全 | 非线程安全 |
适用 | 按自然顺序或自定义顺序遍历键(key) | HashMap里面存入的键值对在取出的时候是随机的,它根据键的HashCode值存储数据,根据键可以直接获取它的值,具有很快的访问速度。在Map 中插入、删除和定位元素,HashMap是最好的选择。 |
缺点 | 插入、删除需要维护平衡会牺牲一些效率 | x |
HashMap通常比TreeMap快一点(树和哈希表的数据结构使然),建议多使用HashMap,在需要排序的Map时候才用TreeMap.
模拟150W以内海量数据的插入和查找,通过增加和查找两方面的性能测试,结果如下:
type | 10W | 50W | 100W | 150W | 0-1W | 0-25W | 0-50W |
---|---|---|---|---|---|---|---|
type | 插入 | 插入 | 插入 | 插入 | 查找 | 查找 | 查找 |
HashMap | 18 ms | 93 ms | 217 ms | 303ms | 2 ms | 13 ms | 45 ms |
Concurrent SkipListMap | 62 ms | 227 ms | 433 ms | 689ms | 7 ms | 80 ms | 119 ms |
TreeMap | 33 ms | 228 ms | 429 ms | 584 ms | 4ms | 34 ms | 61 ms |
参考:
HashMap和TreeMap区别详解以及底层实现
Java8系列之重新认识HashMap
二。LightWeightHashSet vs HashSet
对于HashSet而言,它是基于HashMap实现的,HashSet底层使用HashMap来保存所有元素,因此HashSet 的实现比较简单,相关HashSet的操作,基本上都是直接调用底层HashMap的相关方法来完成。
LightWeightHashSet
/**
* A low memory linked hash set implementation, which uses an array for storing
* the elements and linked lists for collision resolution. This class does not
* support null element.
*
* This class is not thread safe.
*
*/
public class LightWeightHashSet<T> implements Collection<T> {
/**
* An internal array of entries, which are the rows of the hash table. The
* size must be a power of two.
*/
protected LinkedElement<T>[] entries;
低内存实现,用数组存储元素,链表解决冲突问题。
不支持null元素。非线程安全。
Yi Liu added a comment - 30/Jul/15 08:42
Arpit Agarwal, sorry for late response
do you have any estimates of the memory saved by using LightWeightHashSet?
Yes, compared to java HashSet, there are two advantages from memory point of review:
Java HashSet internally uses a HashMap, so there is one more reference (4 bytes) for each entry compared to LightWeightHashSet, so we can save 4 * size bytes of memory. In LightWeightHashSet, when elements become less, the size is shrinked a lot.
So we can see LightWeightHashSet is more better. The main issue is LightWeightHashSet#LinkedSetIterator doesn’t support remove currently, it’s easy to support it (similar to java HashSet). By the way, currently in Hadoop, we use LightWeightHashSet for all big objects required hash set except this one which needs to use remove.
以下为翻译分析。
1.因为HashSet基于HashMap实现,每个元素都会多一个引用,一个引用占4个字节。
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element <tt>e</tt> to this set if
* this set contains no element <tt>e2</tt> such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns <tt>false</tt>.
*
* @param e element to be added to this set
* @return <tt>true</tt> if this set did not already contain the specified
* element
*/
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
2.因为LightWeightHashSet是用链表方式实现,所以当元素减少时,内存占用也会少很多。
HashMap竟然没有shrink,这样设计的原因,一部分原因是不太好做,而且性能和效率不太好保证。
这里的LightWeightHashSet采用的是一种cpu换内存的做法。
/**
* Resize the internal table to given capacity.
*/
@SuppressWarnings("unchecked")
private void resize(int cap) {
int newCapacity = computeCapacity(cap);
if (newCapacity == this.capacity) {
return;
}
this.capacity = newCapacity;
this.expandThreshold = (int) (capacity * maxLoadFactor);
this.shrinkThreshold = (int) (capacity * minLoadFactor);
this.hash_mask = capacity - 1;
LinkedElement<T>[] temp = entries;
entries = new LinkedElement[capacity];
for (int i = 0; i < temp.length; i++) {
LinkedElement<T> curr = temp[i];
while (curr != null) {
LinkedElement<T> next = curr.next;
int index = getIndex(curr.hashCode);
curr.next = entries[index];
entries[index] = curr;
curr = next;
}
}
}
参考:
Shrinking HashMaps (was Re: Proposal: Better HashMap.resize() when memory is tight)
is-java-hashmap-clear-and-remove-memory-effective
三。LightWeightGSet vs LightWeightHashSet
LightWeightGSet不会根据数据的多少而进行二次扩容,而LightWeightHashSet会有个临界点来触发扩容,而且比原生的HashSet还多了一个缩容,但是扩容和缩容都是一个非常耗时的过程,因为它需要重新计算这些数据在新table数组中的位置并进行复制处理。所以如果我们已经预知元素的个数,那么预设元素的个数能够有效的提高性能,这种情况下使用LightWeightGSet会非常合适。
LightWeightGSet中底层数组的大小是在构造函数中固定的,并且其数组的大小不会扩容,则其数组的初始化就尤为重要,下面看下BlocksMap设置的LightWeightGSet数组的大小。代码如下:
/**
* Let t = percentage of max memory.
* Let e = round(log_2 t).
* Then, we choose capacity = 2^e/(size of reference),
* unless it is outside the close interval [1, 2^30].
*/
public static int computeCapacity(double percentage, String mapName) {
return computeCapacity(Runtime.getRuntime().maxMemory(), percentage,
mapName);
}
static int computeCapacity(long maxMemory, double percentage,
String mapName) {
... // 参数校验
//VM detection
//See http://java.sun.com/docs/hotspot/HotSpotFAQ.html#64bit_detection
final String vmBit = System.getProperty("sun.arch.data.model");
//Percentage of max memory
final double percentDivisor = 100.0/percentage;
// 运行时的最大内存Runtime.getRuntime().maxMemory()
final double percentMemory = maxMemory/percentDivisor;
//compute capacity
final int e1 = (int)(Math.log(percentMemory)/Math.log(2.0) + 0.5);
final int e2 = e1 - ("32".equals(vmBit)? 2: 3);
final int exponent = e2 < 0? 0: e2 > 30? 30: e2;
final int c = 1 << exponent;
...
return c;
}
代码的主要逻辑是获取运行时的最大内存,然后根据百分比计算最多能保存的元素个数。
默认使用内存的2%来存储块儿信息。
公式自行研究官方pdf。。。
shv Konstantin Shvachko added a comment - 30/Apr/10 01:43
Do you have an estimate on how much space this will save in NN’s memory footprint?
szetszwo Tsz Wo Nicholas Sze added a comment - 30/Apr/10 08:33I believe we can save from 24 to 40 bytes per entry. It depends on the chosen implementation (will give more details later).
In a large clusters, there are ~60m blocks. Then, we may save from 1.5GB to 2.5GB NN memory.
参考:
Reducing NameNode memory usage by an alternate hash table
HDFS中LightWeightGSet与HashMap结构解析
四。汇总
数据结构就分析到这,hadoop项目中类似于这样的基础数据结构封装比较多,
如LightWeightGSet,LightWeightHashSet,LightWeightLinkedSet,可以理解为各种特定用途的定制,使用前一定要了解其优缺点。
下面把JDK的基础数据结构和HADOOP部分的继承关系简单画一下。
graph LR
LightWeightGSet-->GSet
LightWeightHashSet-->Collection
LightWeightLinkedSet-->LightWeightHashSet
集合类说明及区别
graph LR
Collection-->List
List-->LinkedList
List-->ArrayList
List-->Vector
Vector-->Stack
Collection-->Set
Map-->Hashtable
Map-->HashMap
Map-->WeakHashMap
LightWeightGSet
/**
* A low memory footprint {@link GSet} implementation,
* which uses an array for storing the elements
* and linked lists for collision resolution.
*
* No rehash will be performed.
* Therefore, the internal array will never be resized.
*
* This class does not support null element.
*
* This class is not thread safe.
*
* @param <K> Key type for looking up the elements
* @param <E> Element type, which must be
* (1) a subclass of K, and
* (2) implementing {@link LinkedElement} interface.
*/
@InterfaceAudience.Private
public class LightWeightGSet<K, E extends K> implements GSet<K, E> {
低内存实现,用数组存储元素,链表解决冲突问题。
不会出现rehash,因此内部的数组大小永远不会变化。
不支持null元素。非线程安全。
LightWeightHashSet
/**
* A low memory linked hash set implementation, which uses an array for storing
* the elements and linked lists for collision resolution. This class does not
* support null element.
*
* This class is not thread safe.
*
*/
public class LightWeightHashSet<T> implements Collection<T> {
低内存实现,用数组存储元素,链表解决冲突问题。
不支持null元素。非线程安全。
LightWeightLinkedSet
/**
* A low memory linked hash set implementation, which uses an array for storing
* the elements and linked lists for collision resolution. In addition it stores
* elements in a linked list to ensure ordered traversal. This class does not
* support null element.
*
* This class is not thread safe.
*
*/
public class LightWeightLinkedSet<T> extends LightWeightHashSet<T> {
低内存实现,用数组存储元素,链表解决冲突问题。
为保证有序遍历,使用链表实现。
不支持null元素。非线程安全。
Apache jira相关
多多关注社区,努力站到巨人的肩膀上,不要老在井里面玩。
https://issues.apache.org/jira/browse/HDFS-8792
BlockManager#postponedMisreplicatedBlocks should use a LightWeightHashSet to save memory
Description
LightWeightHashSet requires fewer memory than java hashset.
https://issues.apache.org/jira/browse/HDFS-8793
https://issues.apache.org/jira/browse/HDFS-8794
Improve CorruptReplicasMap#corruptReplicasMap
Description
Currently we use TreeMap for corruptReplicasMap, actually the only need sorted place is getCorruptReplicaBlockIds which is used by test.
So we can use HashMap.
From memory and performance view, HashMap is better than TreeMap, a simliar optimization HDFS-7433. Of course we need to make few change to getCorruptReplicaBlockIds.
https://issues.apache.org/jira/browse/HDFS-1890
A few improvements on the LeaseRenewer.pendingCreates map
Description
The class is better to be just a Map instead of a SortedMap.
The value type is better to be DFSOutputStream instead of OutputStream.
The variable name is better to be filesBeingWritten instead of pendingCreates since we have append.