CPU cache知识四 —— 为什么要cache line对齐

最新推荐文章于 2025-03-20 19:38:09 发布

denglin12315

最新推荐文章于 2025-03-20 19:38:09 发布

阅读量4.2k

点赞数 5

本文链接：https://blog.csdn.net/denglin12315/article/details/117822364

版权

本文详细解释了CPU缓存行对齐的重要性，它涉及到多核处理器中不同进程对共享内存区域的访问。当数据结构未对齐时，可能导致不必要的锁竞争，影响系统性能。通过在数据结构末尾添加填充以确保每个结构独立占用一个缓存行，可以避免这种情况，从而提高并行处理的效率。缓存一致性协议如MESI要求修改缓存行会锁定对应内存，导致其他核心访问同一行的阻塞，即‘缓存行伪共享’问题。通过对数据结构进行缓存行对齐，可以显著减少这种性能瓶颈。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

What does “cacheline aligned” mean?
CPU caches transfer data from and to main memory in chunks(一块) called cache lines;
a typical size for this seems to be 64 bytes.

Data that are located closer to each other than this(this指64B) may end up on the same cache line.

If these data are needed by different cores, the system has to work hard to keep the
data consistent between the copies residing in the cores' caches. Essentially, while
one thread modifies the data, the other thread is blocked by a lock from accessing the data.

The article you reference talks about one such problem that was found in PostgreSQL
in a data structure in shared memory that is frequently updated by different processes.
By introducing padding into the structure to inflate it to 64 bytes, it is guaranteed
that no two such data structures end up in the same cache line, and the processes that
access them are not blocked more that absolutely necessary.

This is only relevant if your program parallelizes execution and accesses a shared
memory region, either by multithreading or by multiprocessing with shared memory.
In this case you can benefit by making sure that data that are frequently accessed by
different execution threads are not located close enough in memory that they can end
up in the same cache line.

The typical way to do that is by adding “dead” padding space at the end of a data structure.

一、假设两个数据结构,在内存中的位置和布局如下:
---------------------------- <--- 0x0
A{
unsigned int a;
}
---------------------------- <--- 0x4
B{
unsinged int b;
}
---------------------------- <--- 0x8

二、双核处理器各个CPU的cache line都是64字节
如果CPU0 A进程要访问A数据结构，CPU0的cache就会将0x0~0x40内存区间的数据加载到CPU0的某个cache line。
如果CPU0 A进程修改了A数据结构，那么CPU0的该cache line对应的内存数据块(0x0~0x40)就会被加锁，以阻止其他进程访问。一个难以接受的现象就是B数据结构也位于该内存区域内，也被加锁了，导致访问B数据结构的CPU1的B进程也被阻塞。

三、如果对两个数据结构进行cache line对齐，在内存中的位置和布局如下:
---------------------------- <--- 0x0
A{
unsigned int a;
} __cacheline_aligned
pad
pad
pad
...
pad
---------------------------- <--- 0x40
B{
unsinged int b;
}__cacheline_aligned
pad
pad
pad
...
pad
---------------------------- <--- 0x80

四、双核处理器各个CPU的cache line都是64字节
如果CPU0 A进程要访问A数据结构，CPU0的cache就会将0x0~0x40内存区间的数据加载到CPU0的某个cache line。
如果CPU0 A进程修改了A数据结构，那么CPU0的该cache line对应的内存数据块(0x0~0x40)就会被加锁，以阻止其他进程访问。
此时，CPU1 B进程要访问B数据结构，CPU1的cache就会将0x40~0x80内存区间的数据加载到CPU1的某个cache line。
因为内存0x0~0x40加锁并不会影响0x40~0x80的访问。
这就是为什么使用__cacheline_aligned修饰数据结构的原因。

五、多年来一直说cache line不对齐可能对性能造成影响，现在可以总结出原因了吧?
一个CPU核要访问一个数据结构A，就会将从该结构A开始的一个cache line大小的内存读入自己的cache，如果修改了该cache line内容（结构A内容），该段cache line映射的内存就会被上锁。如果上述数据结构不是cache line对齐的，有可能该cache line中也包含了另外一个CPU进程要访问的其他数据结构B，上述锁就会阻塞要访问数据结构B的其他CPU上的进程。降低系统性能。

六、为何要对这段内存上锁？

根据cache一致性协议(MESI)，CPU0修改结构体A会导致CPU1的cache line失效，同理，CPU1对结构体B的修改也会导致CPU0的cache line失效。如果CPU0和CPU1反复修改，那么就会使得Linux系统进行频繁的内存加锁操作，必然引起系统性能下降。这种现象叫做“cache line伪共享”，两个CPU原本没有共享访问，因为要共同访问同一个cache line，产生了事实上的共享。解决上述问题的一个方法是让结构体按照cache line对齐，典型的以时间换空间。