One thing about Java that has always
bothered me, given my C/C++ roots, is the lack of a way to figure
out how much memory is used by an object. C++ features the sizeof
operator, that lets you query the size of primitive types and also
the size of objects of a given class. This operator in C and C++ is
useful for pointer arithmetic, copying memory around, and IO, for
example.
Java doesn't have a corresponding
operator. In reality, Java doesn't need one. Size of primitive
types in Java is defined in the language specification, whereas in
C and C++ it depends on the platform. Java has its own IO
infrastructure built around serialization. And both pointer
arithmetic and bulk memory copy don't apply because Java doesn't
have pointers.
But every Java developer at some point
wondered how much memory is used by a Java object. The answer, it
turns out, is not so simple.
The first distinction to be made is
between shallow size and deep size. The shallow size of an object
is the space occupied by the object alone, not taking into account
size of other objects that it references. The deep size, on the
other hand, takes into account the shallow size of the object, plus
the deep size of each object referenced by this object,
recursively. Most of the times you will be interested on knowing
the deep size of an object, but, in order to know that, you need to
know how to calculate the shallow size first, which is what I'm
going to talk about here.
One complication is that runtime in
memory structure of Java objects is not enforced by the virtual
machine specification, which means that virtual machine providers
can implement them as they please. The consequence is that you can
write a class, and instances of that class in one VM can occupy a
different amount of memory than instances of that same class when
run in another VM. Most of the world, including myself, uses the
Sun HotSpot virtual machine though, which simplifies things a lot.
The remainder of the discussion will focus on the 32 bit Sun JVM. I
will lay down a few 'rules that will help explain how the JVM
organizes the objects' layout in memory.
Memory layout of classes that have no
instance attributes
In the Sun JVM, every object (except
arrays) has a 2 words header. The first word contains the object's
identity hash code plus some flags like lock state and age, and the
second word contains a reference to the object's class. Also, any
object is aligned to an 8 bytes granularity. This is the first rule
or objects memory layout:
Rule 1: every object is aligned to an 8
bytes granularity.
Now we know that if we call new Object(),
we will be using 8 bytes of the heap for the two header words and
nothing else, since the Object class doesn't have any
fields.
Memory layout of classes that extend
Object
After the 8 bytes of header, the class
attributes follow. Attributes are always aligned in memory to their
size. For instance, ints are aligned to a 4 byte granularity, and
longs are aligned to an 8 byte granularity. There is a performance
reason to do it this way: usually the cost to read a 4 bytes word
from memory into a 4 bytes register of the processor is much
cheaper if the word is aligned to a 4 bytes
granularity.
In order to save some memory, the Sun VM
doesn't lay out object's attributes in the same order they are
declared. Instead, the attributes are organized in memory in the
following order:
doubles and longs
ints and floats
shorts and chars
booleans and bytes
references
This scheme allows for a good
optimization of memory usage. For example, imagine you declared the
following class:
class MyClass {
byte a;
int
c;
boolean d;
long e;
Object f;
}
If the JVM didn't reorder the attributes,
the object memory layout would be like this:
[HEADER: 8 bytes]
8
[a: 1 byte ] 9
[padding: 3 bytes] 12
[c: 4 bytes] 16
[d: 1 byte ] 17
[padding: 7 bytes] 24
[e: 8 bytes] 32
[f: 4 bytes] 36
[padding: 4 bytes] 40
Notice that 14 bytes would have been
wasted with padding and the object would use 40 bytes of memory. By
reordering the objects using the rules above, the in memory
structure of the object becomes:
[HEADER: 8 bytes]
8
[e: 8 bytes] 16
[c: 4 bytes] 20
[a: 1 byte ] 21
[d: 1 byte ] 22
[padding: 2 bytes] 24
[f: 4 bytes] 28
[padding: 4 bytes] 32
This time, only 6 bytes are used for
padding and the object uses only 32 bytes of memory.
So here is rule 2 of object memory
layout:
Rule 2: class attributes are ordered like
this: first longs and doubles; then ints and floats; then chars and
shorts; then bytes and booleans, and last the references. The
attributes are aligned to their own granularity.
Now we know how to calculate the memory
used by any instance of a class that extends Object directly. One
practical example is the java.lang.Boolean class. Here is its
memory layout:
[HEADER: 8 bytes]
8
[value: 1 byte ]
9
[padding: 7 bytes] 16
An instance of the Boolean class takes 16
bytes of memory! Surprised? (Notice the padding at the end to align
the object size to an 8 bytes granularity.)
Memory layout of subclasses of other
classes
The next three rules are followed by the
JVM to organize the the fields of classes that have superclasses.
Rule 3 of object memory layout is the following:
Rule 3: Fields that belong to different
classes of the hierarchy are NEVER mixed up together. Fields of the
superclass come first, obeying rule 2, followed by the fields of
the subclass.
Here is an example:
class A {
long
a;
int
b;
int
c;
}
class B extends A {
long
d;
}
An instance of B looks like this in
memory:
[HEADER: 8 bytes]
8
[a: 8 bytes] 16
[b: 4 bytes] 20
[c: 4 bytes] 24
[d: 8 bytes] 32
The next rule is used when the fields of
the superclass don't fit in a 4 bytes granularity. Here is what it
says:
Rule 4: Between the last field of the
superclass and the first field of the subclass there must be
padding to align to a 4 bytes boundary.
Here is an example:
class A {
byte
a;
}
class B {
byte
b;
}
[HEADER: 8 bytes]
8
[a: 1 byte ] 9
[padding: 3 bytes] 12
[b: 1 byte ] 13
[padding: 3 bytes] 16
Notice the 3 bytes padding after field a
to align b to a 4 bytes granularity. That space is lost and cannot
be used by fields of class B.
The final rule is applied to save some
space when the first field of the subclass is a long or double and
the parent class doesn't end in an 8 bytes boundary.
Rule 5: When the first field of a
subclass is a double or long and the superclass doesn't align to an
8 bytes boundary, JVM will break rule 2 and try to put an int, then
shorts, then bytes, and then references at the beginning of the
space reserved to the subclass until it fills the gap.
Here is an example:
class A {
byte a;
}
class B {
long b;
short c;
byte d;
}
Here is the memory layout:
[HEADER: 8 bytes]
8
[a: 1 byte ] 9
[padding: 3 bytes] 12
[c: 2 bytes] 14
[d: 1 byte ] 15
[padding: 1 byte ] 16
[b: 8 bytes] 24
At byte 12, which is where class A
'ends', the JVM broke rule 2 and stuck a short and a byte before a
long, to save 3 out of 4 bytes that would otherwise have been
wasted.
Memory layout of arrays
Arrays have an extra header field that
contain the value of the 'length' variable. The array elements
follow, and the arrays, as any regular objects, are also aligned to
an 8 bytes boundary.
Here is the layout of a byte array with 3
elements:
[HEADER: 12 bytes]
12
[[0]: 1 byte ] 13
[[1]: 1 byte ] 14
[[2]: 1 byte ] 15
[padding: 1 byte ]
16
And here is the layout of a long array
with 3 elements:
[HEADER: 12 bytes]
12
[padding: 4 bytes]
16
[[0]: 8 bytes] 24
[[1]: 8 bytes] 32
[[2]: 8 bytes] 40
Memory layout of inner
classes
Non-static inner classes have an extra
'hidden' field that holds a reference to the outer class. This
field is a regular reference and it follows the rule of the in
memory layout of references. Inner classes, for this reason, have
an extra 4 bytes cost.
Final thoughts
We have learned how to calculate the
shallow size of any Java object in the 32 bit Sun JVM. Knowing how
memory is structured can help you understand how much memory is
used by instances of your classes.