

This page is about a technique for reducing the memory footprint of programs
in compiled languages with C-like structures -
manually repacking these declarations for reduced size.
To read it, you will require basic knowledge of the C programming language.


You need to know this technique if you intend to write code
for memory-constrained embedded systems, or operating-system kernels.
It is useful if you are working with application data sets so large
that your programs routinely hit memory limits.
It is good to know in any application where you really,
really care about optimizing your use of memory bandwidth and minimizing cache-line misses.


Finally, knowing this technique is a gateway to other esoteric C topics.
You are not an advanced C programmer until you have grasped these rules.
You are not a master of C until
you could have written this document yourself and can criticize it intelligently.


This document originated with “C” in the title,
but many of the techniques discussed here also apply to the Go language - and,
should generalize to any compiled language with C-like structures.
There is a note discussing Go and Rust towards the end.


Alignment requirements 对齐要求

The first thing to understand is that, on modern processors,
the way your compiler lays out basic datatypes in memory is constrained
in order to make memory accesses faster.
Our examples are in C, but any compiled language generates code under the same constraints.


Storage for the basic C datatypes on an x86 or ARM processor doesn’t normally start at arbitrary byte addresses in memory.
Rather, each type except char has an alignment requirement;
chars can start on any byte address, but 2-byte shorts must start on an even address,
4-byte ints or floats must start on an address divisible by 4, and 8-byte longs or doubles must start on an address divisible by 8.
Signed or unsigned makes no difference.


The jargon for this is that basic C types on x86 and ARM are self-aligned.
Pointers, whether 32-bit (4-byte) or 64-bit (8-byte) are self-aligned too.


Self-alignment makes access faster because it facilitates generating single-instruction fetches and puts of the typed data.
Without alignment constraints, on the other hand, the code might end up having to do two or more accesses spanning machine-word boundaries. Characters are a special case; they’re equally expensive from anywhere they live inside a single machine word. That’s why they don’t have a preferred alignment.


I said “on modern processors” because on some older ones forcing your C program to violate alignment rules
(say, by casting an odd address into an int pointer and trying to use it) didn’t just slow your code down,
it caused an illegal instruction fault.
This was the behavior, for example, on Sun SPARC chips.
In fact, with sufficient determination and the right (e18) hardware flag set on the processor,
you can still trigger this on x86.

例如,这就是Sun SPARC芯片上的行为。
你仍然可以在 x86上触发这种行为。

Also, self-alignment is not the only possible rule.
Historically, some processors (especially those lacking barrel shifters) have had more restrictive ones.
If you do embedded systems, you might trip over one of these lurking in the underbrush. Be aware this is possible.


From when it was first written at the beginning of 2014 until late 2016, this section ended with the last paragraph.
During that period I’ve learned something rather reassuring from working with the source code for the reference implementation of NTP.
It does packet analysis by reading packets off the wire directly into memory
that the rest of the code sees as a struct, relying on the assumption of minimal self-aligned padding.


The interesting news is that NTP has apparently being getting away with this for decades across a very wide span of hardware, operating systems, and compilers, including not just Unixes but under Windows variants as well.
This suggests that platforms with padding rules other than self-alignment are either nonexistent
or confined to such specialized niches that they’re never either NTP servers or clients.

有趣的是几十年里,NTP显然已经在硬件、操作系统和编译器的广泛跨度中脱颖而出,其中不仅包括 Unixes,还包括 Windows 版本。 这表明,除了自对齐之外,拥有其他填充规则的平台要么是不存在的,

Padding 填充

Now we’ll look at a simple example of variable layout in memory.
Consider the following series of variable declarations in the top level of a C module:


char *p;
char c;
int x;

If you didn’t know anything about data alignment,
you might assume that these three variables would occupy a continuous span of bytes in memory.
That is, on a 32-bit machine 4 bytes of pointer would be immediately followed by 1 byte of char and that immediately followed by 4 bytes of int. And a 64-bit machine would be different only in that the pointer would be 8 bytes.

也就是说,在32位机器上,4个字节的指针其后将立即跟随1个字节的 char,紧接着是4个字节的 int。
64位机器的不同之处在于指针是 8字节。

In fact, the hidden assumption that the allocated order of static variables is their source order is not necessarily valid; the C standards don’t mandate it.
I’m going to ignore this detail because (a) that hidden assumption is usually correct anyway,
and (b) the actual purpose of talking about padding and packing outside structures is to prepare you for what happens inside them.

事实上,静态变量的分配顺序是它们的源顺序的隐藏假设并不一定有效; c 标准并不是这样的。

Here’s what actually happens (on an x86 or ARM or anything else with self-aligned types).
The storage for p starts on a self-aligned 4- or 8-byte boundary depending on the machine word size.
This is pointer alignment - the strictest possible.

下面是实际发生的例子(x86或 ARM 或其他具有自对齐类型的平台)。
这就是一个指针对齐 - 可能是一种最严格的对齐方式。

The storage for c follows immediately. But the 4-byte alignment requirement of x forces a gap in the layout;
it comes out as though there were a fourth intervening variable, like this:

紧跟其后存储的是变量 c。但是变量 x的4个字节对齐要求使布局中出现一个空白;

char *p;      /* 4 or 8 bytes */
char c;       /* 1 byte */
char pad[3];  /* 3 bytes */
int x;        /* 4 bytes */

The pad[3] character array represents the fact that there are three bytes of waste space in the structure. The old-school term for this was “slop”. The value of the padding bits is undefined; in particular it is not guaranteed that they will be zeroed.

填充位的值是未定义的; 特别是它不能保证它们将被归零。

Compare what happens if x is a 2-byte short:

比较一下如果 x 是2个字节的 short:

char *p;
char c;
short x;

In that case, the actual layout will be this:

char *p;      /* 4 or 8 bytes */
char c;       /* 1 byte */
char pad[1];  /* 1 byte */
short x;      /* 2 bytes */

On the other hand, if x is a long on a 64-bit machine
另一方面,如果 x 是一个64位机器上的 long

char *p;
char c;
long x;

we end up with this:

char *p;     /* 8 bytes */
char c;      /* 1 byte
char pad[7]; /* 7 bytes */
long x;      /* 8 bytes */

If you have been following carefully, you are probably now wondering about the case where the shorter variable declaration comes first:

char c;
char *p;
int x;

If the actual memory layout were written like this

char c;
char pad1[M];
char *p;
char pad2[N];
int x;

what can we say about M and N?
关于 M 和 N 我们可以讨论些什么?

First, in this case N will be zero.
The address of x, coming right after p, is guaranteed to be pointer-aligned,
which is never less strict than int-aligned.
首先,在这种情况下 N 将是零。 紧跟 p 之后的 x 的地址保证是指针对齐的,

The value of M is less predictable. If the compiler happened to map c to the last byte of a machine word,
the next byte (the first of p) would be the first byte of the next one and properly pointer-aligned. M would be zero.

M 的值不可预测。如果编译器碰巧将 c 映射到机器字长的最后一个字节,
则下一个字节( p 的第一个字节)将是下一个变量的第一个字节,并且正好是指针对齐的。 那么 M 将会是零。

It is more likely that c will be mapped to the first byte of a machine word.
In that case M will be whatever padding is needed to ensure that p has pointer alignment - 3 on a 32-bit machine, 7 on a 64-bit machine.

更有可能的情况是,c 将被映射到机器字长的第一个字节。
在这种情况下,任何字段都将被 M 用来填充,确保 p 是指针对齐的,因此在32位机器上 M 是 3,在64位机器上 M 是 7。

Intermediate cases are possible. M can be anything from 0 to 7 (0 to 3 on 32-bit) because a char can start on any byte boundary in a machine word.

中间的情况都是有是可能的。M 的取值可以是 0 到 7 (32位上就是 0 到 3 ),
因为 char 可以从机器字长边界中的任何一个字节上开始。

If you wanted to make those variables take up less space, you could get that effect by swapping x with c in the original sequence.

如果你想让这些变量占用更少的空间,你可以通过在原始序列中交换 x 和 c 来达到这个效果。

char *p;     /* 8 bytes */
long x;      /* 8 bytes */
char c;      /* 1 byte

Usually, for the small number of scalar variables in your C programs, bumming out the few bytes you can get by changing the order of declaration won’t save you enough to be significant. The technique becomes more interesting when applied to nonscalar variables - especially structs.

通常,对于你的C语言程序中一小部分的标量变量,通过更改声明的顺序后获得的少量字节不会为你节省多么可观的空间。 当这种技巧应用于非标量变量时,特别是结构体变量时,将变得更加有趣。

Before we get to those, let’s dispose of arrays of scalars. On a platform with self-aligned types, arrays of char/short/int/long/pointer have no internal padding; each member is automatically self-aligned at the end of the next one.

在一个具有自对齐类型的平台上,char / short / int / long / 指针 数组没有内部填充; 每个成员在下一个结束时自动对齐。

All these rules and examples map over to Go with only syntactic changes.


In the next section we will see that the same is not necessarily true of structure arrays.


Structure alignment and padding 结构体的对齐和填充

In general, a struct instance will have the alignment of its widest scalar member.
Compilers do this as the easiest way to ensure that all the members are self-aligned for fast access.


Also, in C (and Go, and Rust) the address of a struct is the same as the address of its first member - there is no leading padding.
Beware: in C++, classes that look like structs may break this rule!
(Whether they do or not depends on how base classes and virtual member functions are implemented, and varies by compiler.)

此外,在 C (Go and Rust)中,结构体的地址与其第一个成员的地址相同 - 没有前导填充。
注意: 在 CPP 中,类似结构的类可能不遵循这个规则!

(When you’re in doubt about this sort of thing, ANSI C provides an offsetof() macro which can be used to read out structure member offsets.)

(如果你对这类事情有所疑惑,ANSI C标准提供了offsetof(),可以用来获取结构成员偏移量的宏)

Consider this struct:

struct foo1 {
    char *p;
    char c;
    long x;

Assuming a 64-bit machine, any instance of struct foo1 will have 8-byte alignment.
The memory layout of one of these looks unsurprising, like this:


struct foo1 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte
    char pad[7]; /* 7 bytes */
    long x;      /* 8 bytes */

It’s laid out exactly as though variables of these types has been separately declared. But if we put c first, that’s no longer true.

但是如果我们把 c 放在第一位,那就不再如此。

struct foo2 {
    char c;      /* 1 byte */
    char pad[7]; /* 7 bytes */
    char *p;     /* 8 bytes */
    long x;      /* 8 bytes */

If the members were separate variables, c could start at any byte boundary and the size of pad might vary.
Because struct foo2 has the pointer alignment of its widest member, that’s no longer possible.
Now c has to be pointer-aligned, and following padding of 7 bytes is locked in.

如果结构体成员是单独的变量,那么 c 可以从任意一个字节的边界开始,填充的大小可能会有所不同。
因为 struct foo2 将依照最宽的结构体成员 - 指针,进行对齐,所以将不再可能。
现在 c 必须是指针对齐的,然后接着是7个无用字节的填充。

Now let’s talk about trailing padding on structures. To explain this,
I need to introduce a basic concept which I’ll call the stride address of a structure.
It is the first address following the structure data that has the same alignment as the structure.


The general rule of trailing structure padding is this:
the compiler will behave as though the structure has trailing padding out to its stride address. This rule controls what sizeof() will return.


Consider this example on a 64-bit x86 or ARM machine:

在64位 x86 或 ARM 平台的机器上考虑这个例子:

struct foo3 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */

struct foo3 singleton;
struct foo3 quad[4];

You might think that sizeof(struct foo3) should be 9, but it’s actually 16.
The stride address is that of (&p)[2].
Thus, in the quad array, each member has 7 bytes of trailing padding,
because the first member of each following struct wants to be self-aligned on an 8-byte boundary.

你可能认为sizeof(struct foo3)的大小应该是9,但实际上是16。 跨步地址是(& p)[2]。

The memory layout is as though the structure had been declared like this:

struct foo3 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */
    char pad[7];

For contrast, consider the following example:

struct foo4 {
    short s;     /* 2 bytes */
    char c;      /* 1 byte */

Because s only needs to be 2-byte aligned, the stride address is just one byte after c,
and struct foo4 as a whole only needs one byte of trailing padding. It will be laid out like this:

因为 s 只需要2字节对齐,跨步地址仅仅是 c 之后的一个字节,
而在整体上结构体foo4只需要一个字节的尾部填充。 它将如下所示:

struct foo4 {
    short s;     /* 2 bytes */
    char c;      /* 1 byte */
    char pad[1];

and sizeof(struct foo4) will return 4.
sizeof(struct foo4)将返回4。

Here’s a last important detail: If your structure has structure members,
the inner structs want to have the alignment of longest scalar too. Suppose you write this:

最后这里有一个重要的细节: 如果你的结构体中也有结构体成员,
内部结构体也希望用有最长的标量对齐。 假设你写下这个:

struct foo5 {
    char c;
    struct foo5_inner {
        char *p;
        short x;
    } inner;

The char *p member in the inner struct forces the outer struct to be pointer-aligned as well as the inner.
Actual layout will be like this on a 64-bit machine:

内部结构体中的 char *p 成员将强制使外部结构体是指针对齐的,同样也会使内部结构体是指针对齐的。

struct foo5 {
    char c;           /* 1 byte*/
    char pad1[7];     /* 7 bytes */
    struct foo5_inner {
        char *p;      /* 8 bytes */
        short x;      /* 2 bytes */
        char pad2[6]; /* 6 bytes */
    } inner;

This structure gives us a hint of the savings that might be possible from repacking structures. Of 24 bytes, 13 of them are padding. That’s more than 50% waste space!


Bitfields 位域

Now let’s consider C bitfields. What they give you the ability to do is declare structure fields of smaller than character width, down to 1 bit, like this:

现在让我们考虑一下C语言的位域。 通过位域你可以声明小于一个字符宽度的结构体字段,甚至下降到1位,就像这样:

struct foo6 {
    short s;
    char c;
    int flip:1;
    int nybble:4;
    int septet:7;

The thing to know about bitfields is that they are implemented with word- and byte-level mask
and rotate instructions operating on machine words, and cannot cross word boundaries.
C99 guarentees that bit-fields will be packed as tightly as possible, provided they don’t cross storage unit boundaries ( #10).


