消逝的艺术之结构体封装

谁应该阅读这篇文章

This page is about a technique for reducing the memory footprint of programs
in compiled languages with C-like structures -
manually repacking these declarations for reduced size.
To read it, you will require basic knowledge of the C programming language.

这篇文章讲述了一种技巧:如何手动封装类C语言的结构体,以此来达到减小内存大小的目的。
阅读它之前,你需要掌握C语言的基础知识。

You need to know this technique if you intend to write code
for memory-constrained embedded systems, or operating-system kernels.
It is useful if you are working with application data sets so large
that your programs routinely hit memory limits.
It is good to know in any application where you really,
really care about optimizing your use of memory bandwidth and minimizing cache-line misses.

如果你打算为具有内存限制的嵌入式系统或操作系统内核编写代码,则需要了解此技术。
如果您编写的应用程序处理的数据集非常大,以至于程序会经常地达到内存瓶颈,这项技术将是非常有帮助。
如果您编写任何应用程序时,确实关心如何优化你的内存带宽以及如何使缓存未命中达到最小化,这项技术也将使你受益匪浅。

Finally, knowing this technique is a gateway to other esoteric C topics.
You are not an advanced C programmer until you have grasped these rules.
You are not a master of C until
you could have written this document yourself and can criticize it intelligently.

最后,了解这个技巧是一个通往其他C语言深入领域的大门。
如果掌握这些规则之后,那么你将是一名资深的C语言程序员。
如果你能编写这份文档并且做出有想法的评论,那么你将是一名C语言大神。

This document originated with “C” in the title,
but many of the techniques discussed here also apply to the Go language - and,
should generalize to any compiled language with C-like structures.
There is a note discussing Go and Rust towards the end.

虽然这份文档以标题中的C语言开头,
但是这里讨论的一些技巧将同样适用于Go语言
当然,这应该推广到任何一门编译型的类C语言,
文末会讨论Go和Rust相关特性

Alignment requirements 对齐要求

The first thing to understand is that, on modern processors,
the way your compiler lays out basic datatypes in memory is constrained
in order to make memory accesses faster.
Our examples are in C, but any compiled language generates code under the same constraints.

首先要理解的是,在现代处理器上,
编译器在内存中存放基本的数据类型时将遵循相应的约束条件,以便加快访存速度。
尽管我们给出的例子是C语言,但是任何一个编译语言生成代码时都会受到相同的约束条件。

Storage for the basic C datatypes on an x86 or ARM processor doesn’t normally start at arbitrary byte addresses in memory.
Rather, each type except char has an alignment requirement;
chars can start on any byte address, but 2-byte shorts must start on an even address,
4-byte ints or floats must start on an address divisible by 4, and 8-byte longs or doubles must start on an address divisible by 8.
Signed or unsigned makes no difference.

在X86或ARM处理器上,存储C语言的基本数据类型时通常不是从内存中的任意字节地址开始的。
相反,除了char之外的每个类型都有对齐的要求;
char可以从任何一个地址上开始,但是2字节的short必须从偶数地址开始,
4字节的int必须从可被4整除的地址开始,8字节的double必须从可84整除的地址开始。
有符号类型和无符号类型没有区别

The jargon for this is that basic C types on x86 and ARM are self-aligned.
Pointers, whether 32-bit (4-byte) or 64-bit (8-byte) are self-aligned too.

这里的专业术语就是在x86和ARM上的C语言基本类型是自对齐的。
包括指针类型,无论是32位(4字节)或64位(8字节)也是自对齐的。

Self-alignment makes access faster because it facilitates generating single-instruction fetches and puts of the typed data.
Without alignment constraints, on the other hand, the code might end up having to do two or more accesses spanning machine-word boundaries. Characters are a special case; they’re equally expensive from anywhere they live inside a single machine word. That’s why they don’t have a preferred alignment.

自对齐使得访问速度更快,因为它有助于生成单个指令来读取存放数据。
另一方面,如果没有对齐约束,最终代码可能不得不跨越机器字长的边界来进行两次或多次访问。
字符类型是一个特例:当他们存放在一个机器字长中,它们在任何地方(被取出的代价)都是同样昂贵的。
这就是为什么他们没有一个首选的对齐方式。

I said “on modern processors” because on some older ones forcing your C program to violate alignment rules
(say, by casting an odd address into an int pointer and trying to use it) didn’t just slow your code down,
it caused an illegal instruction fault.
This was the behavior, for example, on Sun SPARC chips.
In fact, with sufficient determination and the right (e18) hardware flag set on the processor,
you can still trigger this on x86.

我之所以说明是“在现代处理器上”,是因为在一些旧的处理器上,如果你的C程序违反了对齐规则
(比如在一个int指针中输入一个奇怪的地址并试图使用它)
这不仅减慢了代码运行的速度,而且还会导致非法指令的错误。
例如,这就是Sun SPARC芯片上的行为。
事实上,只要有足够的决心并且在处理器上设置右(e18)硬件标志,
你仍然可以在 x86上触发这种行为。

Also, self-alignment is not the only possible rule.
Historically, some processors (especially those lacking barrel shifters) have had more restrictive ones.
If you do embedded systems, you might trip over one of these lurking in the underbrush. Be aware this is possible.

此外,自对齐并不是唯一可能的规则。
从历史上看,一些处理器(尤其缺少桶形移位器)会有更多的限制
如果你使用嵌入式系统,你可能会被一个潜伏在系统底层的规则绊倒。要知道这也是可能的。

From when it was first written at the beginning of 2014 until late 2016, this section ended with the last paragraph.
During that period I’ve learned something rather reassuring from working with the source code for the reference implementation of NTP.
It does packet analysis by reading packets off the wire directly into memory
that the rest of the code sees as a struct, relying on the assumption of minimal self-aligned padding.

这一部分从2014年初开始写作,直到2016年末以写完最后一段结束。
在这段时间里,我参考并实现了NTP的源代码,从中学到了一些相当可靠的东西。
它通过将数据包从线上直接读入内存进行分组分析,
其余代码视为结构,依赖于最小自对齐填充的假设。

The interesting news is that NTP has apparently being getting away with this for decades across a very wide span of hardware, operating systems, and compilers, including not just Unixes but under Windows variants as well.
This suggests that platforms with padding rules other than self-alignment are either nonexistent
or confined to such specialized niches that they’re never either NTP servers or clients.

有趣的是几十年里,NTP显然已经在硬件、操作系统和编译器的广泛跨度中脱颖而出,其中不仅包括 Unixes,还包括 Windows 版本。 这表明,除了自对齐之外,拥有其他填充规则的平台要么是不存在的,
要么是仅限于这些专门的平台,因为这些专门的平台从来不需要NTP服务器或客户端。

Padding 填充

Now we’ll look at a simple example of variable layout in memory.
Consider the following series of variable declarations in the top level of a C module:

现在我们来看一个变量在内存中分布的简单例子。
考虑在一个C语言模块的顶层进行如下的变量声明:

char *p;
char c;
int x;

If you didn’t know anything about data alignment,
you might assume that these three variables would occupy a continuous span of bytes in memory.
That is, on a 32-bit machine 4 bytes of pointer would be immediately followed by 1 byte of char and that immediately followed by 4 bytes of int. And a 64-bit machine would be different only in that the pointer would be 8 bytes.

如果你对数据对齐一无所知,
你可能会认为这三个变量在内存中连续分布。
也就是说,在32位机器上,4个字节的指针其后将立即跟随1个字节的 char,紧接着是4个字节的 int。
64位机器的不同之处在于指针是 8字节。

In fact, the hidden assumption that the allocated order of static variables is their source order is not necessarily valid; the C standards don’t mandate it.
I’m going to ignore this detail because (a) that hidden assumption is usually correct anyway,
and (b) the actual purpose of talking about padding and packing outside structures is to prepare you for what happens inside them.

事实上,静态变量的分配顺序是它们的源顺序的隐藏假设并不一定有效; c 标准并不是这样的。
我将忽略这个细节,因为潜在的假设通常是正确的,
另外讨论结构体外面的填充和封装实际是为了让你能了解结构体内部做好铺垫。

Here’s what actually happens (on an x86 or ARM or anything else with self-aligned types).
The storage for p starts on a self-aligned 4- or 8-byte boundary depending on the machine word size.
This is pointer alignment - the strictest possible.

下面是实际发生的例子(x86或 ARM 或其他具有自对齐类型的平台)。
根据机器字长的大小,P从一个自对齐的4字节或8字节的边界开始存储
这就是一个指针对齐 - 可能是一种最严格的对齐方式。

The storage for c follows immediately. But the 4-byte alignment requirement of x forces a gap in the layout;
it comes out as though there were a fourth intervening variable, like this:

紧跟其后存储的是变量 c。但是变量 x的4个字节对齐要求使布局中出现一个空白;
空白出现的时候好像存在第四个中间变量,就像这样:

char *p;      /* 4 or 8 bytes */
char c;       /* 1 byte */
char pad[3];  /* 3 bytes */
int x;        /* 4 bytes */

The pad[3] character array represents the fact that there are three bytes of waste space in the structure. The old-school term for this was “slop”. The value of the padding bits is undefined; in particular it is not guaranteed that they will be zeroed.

pad[3]字符数组表示结构中有三个字节的无用空间。
这个词古老术语是“泔水”。
填充位的值是未定义的; 特别是它不能保证它们将被归零。

Compare what happens if x is a 2-byte short:

比较一下如果 x 是2个字节的 short:

char *p;
char c;
short x;

In that case, the actual layout will be this:
在这种情况下,实际的布局是这样的:

char *p;      /* 4 or 8 bytes */
char c;       /* 1 byte */
char pad[1];  /* 1 byte */
short x;      /* 2 bytes */

On the other hand, if x is a long on a 64-bit machine
另一方面,如果 x 是一个64位机器上的 long

char *p;
char c;
long x;

we end up with this:
最后是这样子的

char *p;     /* 8 bytes */
char c;      /* 1 byte
char pad[7]; /* 7 bytes */
long x;      /* 8 bytes */

If you have been following carefully, you are probably now wondering about the case where the shorter variable declaration comes first:
如果您已经细心的看到这里,现在会考虑较短的变量声明首先出现的情况

char c;
char *p;
int x;

If the actual memory layout were written like this
实际的内存布局是这样写的

char c;
char pad1[M];
char *p;
char pad2[N];
int x;

what can we say about M and N?
关于 M 和 N 我们可以讨论些什么?

First, in this case N will be zero.
The address of x, coming right after p, is guaranteed to be pointer-aligned,
which is never less strict than int-aligned.
首先,在这种情况下 N 将是零。 紧跟 p 之后的 x 的地址保证是指针对齐的,
因为指针自对齐的严格程度不可能小于整型。

The value of M is less predictable. If the compiler happened to map c to the last byte of a machine word,
the next byte (the first of p) would be the first byte of the next one and properly pointer-aligned. M would be zero.

M 的值不可预测。如果编译器碰巧将 c 映射到机器字长的最后一个字节,
则下一个字节( p 的第一个字节)将是下一个变量的第一个字节,并且正好是指针对齐的。 那么 M 将会是零。

It is more likely that c will be mapped to the first byte of a machine word.
In that case M will be whatever padding is needed to ensure that p has pointer alignment - 3 on a 32-bit machine, 7 on a 64-bit machine.

更有可能的情况是,c 将被映射到机器字长的第一个字节。
在这种情况下,任何字段都将被 M 用来填充,确保 p 是指针对齐的,因此在32位机器上 M 是 3,在64位机器上 M 是 7。

Intermediate cases are possible. M can be anything from 0 to 7 (0 to 3 on 32-bit) because a char can start on any byte boundary in a machine word.

中间的情况都是有是可能的。M 的取值可以是 0 到 7 (32位上就是 0 到 3 ),
因为 char 可以从机器字长边界中的任何一个字节上开始。

If you wanted to make those variables take up less space, you could get that effect by swapping x with c in the original sequence.

如果你想让这些变量占用更少的空间,你可以通过在原始序列中交换 x 和 c 来达到这个效果。

char *p;     /* 8 bytes */
long x;      /* 8 bytes */
char c;      /* 1 byte

Usually, for the small number of scalar variables in your C programs, bumming out the few bytes you can get by changing the order of declaration won’t save you enough to be significant. The technique becomes more interesting when applied to nonscalar variables - especially structs.

通常,对于你的C语言程序中一小部分的标量变量,通过更改声明的顺序后获得的少量字节不会为你节省多么可观的空间。 当这种技巧应用于非标量变量时,特别是结构体变量时,将变得更加有趣。

Before we get to those, let’s dispose of arrays of scalars. On a platform with self-aligned types, arrays of char/short/int/long/pointer have no internal padding; each member is automatically self-aligned at the end of the next one.

在我们讨论这些之前,让我们处理一下标量数组。
在一个具有自对齐类型的平台上,char / short / int / long / 指针 数组没有内部填充; 每个成员在下一个结束时自动对齐。

All these rules and examples map over to Go with only syntactic changes.

所有这些规则和例子都可以应用于Go,仅仅只有语法差异而已。

In the next section we will see that the same is not necessarily true of structure arrays.

在下一节中,我们将看到在同样的结构体数组不一定是正确的。

Structure alignment and padding 结构体的对齐和填充

In general, a struct instance will have the alignment of its widest scalar member.
Compilers do this as the easiest way to ensure that all the members are self-aligned for fast access.

一般来说,一个结构体实例将依照最宽的标量成员进行对齐。
编译器以此确保所有结构体成员都是自对齐的,这么做是达到快速访问的最简单方法。

Also, in C (and Go, and Rust) the address of a struct is the same as the address of its first member - there is no leading padding.
Beware: in C++, classes that look like structs may break this rule!
(Whether they do or not depends on how base classes and virtual member functions are implemented, and varies by compiler.)

此外,在 C (Go and Rust)中,结构体的地址与其第一个成员的地址相同 - 没有前导填充。
注意: 在 CPP 中,类似结构的类可能不遵循这个规则!
(是否这样做取决于如何实现基类和虚成员函数,并根据编译器的不同而变化。)

(When you’re in doubt about this sort of thing, ANSI C provides an offsetof() macro which can be used to read out structure member offsets.)

(如果你对这类事情有所疑惑,ANSI C标准提供了offsetof(),可以用来获取结构成员偏移量的宏)

Consider this struct:
考虑一下这个结构:

struct foo1 {
    char *p;
    char c;
    long x;
};

Assuming a 64-bit machine, any instance of struct foo1 will have 8-byte alignment.
The memory layout of one of these looks unsurprising, like this:

假设在64位机器上,结构体foo1类型的任何实例都将是8个字节对齐。
其中的内存布局看起来并不令人惊讶,就像这样:

struct foo1 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte
    char pad[7]; /* 7 bytes */
    long x;      /* 8 bytes */
};

It’s laid out exactly as though variables of these types has been separately declared. But if we put c first, that’s no longer true.

他们恰好的分布就好像他们是已声明的独立变量一样。
但是如果我们把 c 放在第一位,那就不再如此。

struct foo2 {
    char c;      /* 1 byte */
    char pad[7]; /* 7 bytes */
    char *p;     /* 8 bytes */
    long x;      /* 8 bytes */
};

If the members were separate variables, c could start at any byte boundary and the size of pad might vary.
Because struct foo2 has the pointer alignment of its widest member, that’s no longer possible.
Now c has to be pointer-aligned, and following padding of 7 bytes is locked in.

如果结构体成员是单独的变量,那么 c 可以从任意一个字节的边界开始,填充的大小可能会有所不同。
因为 struct foo2 将依照最宽的结构体成员 - 指针,进行对齐,所以将不再可能。
现在 c 必须是指针对齐的,然后接着是7个无用字节的填充。

Now let’s talk about trailing padding on structures. To explain this,
I need to introduce a basic concept which I’ll call the stride address of a structure.
It is the first address following the structure data that has the same alignment as the structure.

现在让我们来谈谈结构体尾部的填充字节。为了解释这一点,
我需要引入一个基本概念,我称之为结构的跨步地址。
它是整个结构体数据之后的第一个地址,与结构体具有相同的对齐方式。

The general rule of trailing structure padding is this:
the compiler will behave as though the structure has trailing padding out to its stride address. This rule controls what sizeof() will return.

尾部填充的一般规则是:
编译器将表现出,结构体好像已经尾填充到了它的跨步地址。
这个规则决定了sizeof()返回的大小。

Consider this example on a 64-bit x86 or ARM machine:

在64位 x86 或 ARM 平台的机器上考虑这个例子:

struct foo3 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */
};

struct foo3 singleton;
struct foo3 quad[4];

You might think that sizeof(struct foo3) should be 9, but it’s actually 16.
The stride address is that of (&p)[2].
Thus, in the quad array, each member has 7 bytes of trailing padding,
because the first member of each following struct wants to be self-aligned on an 8-byte boundary.

你可能认为sizeof(struct foo3)的大小应该是9,但实际上是16。 跨步地址是(& p)[2]。
因此,在数组quad[4]中,每个成员都有7个字节的尾部填充,
因为每个结构的第一个成员希望在一个8个字节(也就是指针对齐)边界上自对齐。

The memory layout is as though the structure had been declared like this:
内存布局就好像结构体是这样声明的:

struct foo3 {
    char *p;     /* 8 bytes */
    char c;      /* 1 byte */
    char pad[7];
};

For contrast, consider the following example:
相比之下,考虑下面的例子:

struct foo4 {
    short s;     /* 2 bytes */
    char c;      /* 1 byte */
};

Because s only needs to be 2-byte aligned, the stride address is just one byte after c,
and struct foo4 as a whole only needs one byte of trailing padding. It will be laid out like this:

因为 s 只需要2字节对齐,跨步地址仅仅是 c 之后的一个字节,
而在整体上结构体foo4只需要一个字节的尾部填充。 它将如下所示:

struct foo4 {
    short s;     /* 2 bytes */
    char c;      /* 1 byte */
    char pad[1];
};

and sizeof(struct foo4) will return 4.
sizeof(struct foo4)将返回4。

Here’s a last important detail: If your structure has structure members,
the inner structs want to have the alignment of longest scalar too. Suppose you write this:

最后这里有一个重要的细节: 如果你的结构体中也有结构体成员,
内部结构体也希望用有最长的标量对齐。 假设你写下这个:

struct foo5 {
    char c;
    struct foo5_inner {
        char *p;
        short x;
    } inner;
};

The char *p member in the inner struct forces the outer struct to be pointer-aligned as well as the inner.
Actual layout will be like this on a 64-bit machine:

内部结构体中的 char *p 成员将强制使外部结构体是指针对齐的,同样也会使内部结构体是指针对齐的。
实际上64位机器上的布局将会是这样的:

struct foo5 {
    char c;           /* 1 byte*/
    char pad1[7];     /* 7 bytes */
    struct foo5_inner {
        char *p;      /* 8 bytes */
        short x;      /* 2 bytes */
        char pad2[6]; /* 6 bytes */
    } inner;
};

This structure gives us a hint of the savings that might be possible from repacking structures. Of 24 bytes, 13 of them are padding. That’s more than 50% waste space!

这种结构给了我们一个暗示,即从重新封装结构体中可能会节省一定的空间。
这个例子里的24个字节里13个字节是填充,这将浪费超过50%的空间!

Bitfields 位域

Now let’s consider C bitfields. What they give you the ability to do is declare structure fields of smaller than character width, down to 1 bit, like this:

现在让我们考虑一下C语言的位域。 通过位域你可以声明小于一个字符宽度的结构体字段,甚至下降到1位,就像这样:

struct foo6 {
    short s;
    char c;
    int flip:1;
    int nybble:4;
    int septet:7;
};

The thing to know about bitfields is that they are implemented with word- and byte-level mask
and rotate instructions operating on machine words, and cannot cross word boundaries.
C99 guarentees that bit-fields will be packed as tightly as possible, provided they don’t cross storage unit boundaries (6.7.2.1 #10).

关于位域需要了解的是,它们是在机器字长中,使用字长度或字节长度级别的掩码
和移位操作来实现的,并且不能跨越字长边界。
C99标准确保了位域将尽可能被紧密地封装,使得它们不会跨越存储单元的边界。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值