I use a hashmap to store a QTable for an implementation of a reinforcement learning algorithm. My hashmap should store 15000000 entries. When I ran my algorithm I saw that the memory used by the process is over 1000000K. When I calculated the memory, I would expect it to use not more than 530000K. I tried to write an example and I got the same high memory usage:
public static void main(String[] args) {
HashMap map = new HashMap<>(16_000_000, 1);
for(int i = 0; i < 15_000_000; i++){
map.put(i, i);
}
}
My memory calulation:
Each entryset is 32 bytes
Capacity is 15000000
HashMap Instance uses: 32 * SIZE + 4 * CAPACITY
memory = (15000000 * 32 + 15000000 * 4) / 1024 = 527343.75K
Where I'm wrong in my memory calculations?
解决方案
Well, in the best case, we assume a word size of 32 bits/4 bytes (with CompressedOops and CompressedClassesPointers). Then, a map entry consists of two words JVM overhead (klass pointer and mark word), key, value, hashcode and next pointer, making 6 words total, in other words, 24 bytes. So having 15,000,000 entry instances will consume 360 MB.
Additionally, there’s the array holding the entries. The HashMap uses capacities that are a power of two, so for 15,000,000 entries, the array size is at least 16,777,216, consuming 64 MiB.
Then, you have 30,000,000 Integer instances. The problem is that map.put(i, i) performs two boxing operations and while the JVM is encouraged to reuse objects when boxing, it is not required to do so and reusing won’t happen in your simple program that is likely to complete before the optimizer ever interferes.
To be precise, the first 128 Integer instances are reused, because for values in the -128 … +127 range, sharing is mandatory, but the implementation does this by initializing the entire cache on the first use, so for the first 128 iterations, it doesn’t create two instances, but the cache consists of 256 instances, which is twice that number, so we end up again with 30,000,000 Integer instances total. An Integer instance consist of at least the two JVM specific words and the actual int value, which would make 12 bytes, but due to the default alignment, the actually consumed memory will be 16 bytes, dividable by eight.
So the 30,000,000 created Integer instances consume 480 MB.
This makes a total of 360 MB + 64 MiB + 480 MB, which is more than 900 MB, making a heap size of 1 GB entirely plausible.
But that’s what profiling tools are for. After running your program, I got
Note that this tool only reports the used size of the objects, i.e. the 12 bytes for an Integer object without considering the padding that you will notice when looking at the total memory allocated by the JVM.