顺序和屏障

最新推荐文章于 2023-11-15 20:19:26 发布

qinzhonghello

最新推荐文章于 2023-11-15 20:19:26 发布

阅读量1.7k

点赞数

分类专栏： Linux内核设计与实现文章标签：编译器存储优化 x86 c

本文链接：https://blog.csdn.net/qinzhonghello/article/details/3583397

版权

Linux内核设计与实现专栏收录该内容

42 篇文章 0 订阅

订阅专栏

当处理多处理器之间或硬件设备之间的同步问题时，有时需要在程序代码中以指定的顺序发出读内存（读入）和写内存（存储）指令。在和硬件交互时，时常需要确保一个给定的读操作发生在其他读或写操作之前。另外，在多处理上，可能需要按写数据时的顺序读数据（通常确保后来以同样的顺序进行读取）。但是编译器和处理器为了提高效率，可能对读和写重新排序。

所有可能重新排序和写的处理器提供了及其指令来确保顺序要求。同样也可以指示编译器不要对给定的点周围的指令进行重新排序，这些确保顺序的指令称为屏障（barrier）。

编译器会在编译时按代码的顺序编译，这种顺序是静态的。但是处理器会重新动态排序，因为处理器在执行指令期间，会在取值和分派时，把表面上看似无关的指令按自认为最好的顺序排列。这种重排序的发生是因为现代处理器为了优化其传送管道，打乱了分派和提交指令的顺序。

不管是编译器还是处理器都不知道其他上下文中的相关代码。偶然情况下，有必要让写操作被其他代码识别，也让我们所期望的指定顺序之外的代码识别。这种情况常常发生在硬件设备上，但那是在多处理器机器上也很常见。

rmb()方法提供了一个“读”内存屏障，它确保跨越rmb()的载入动作不会发生重排序。在rmb()之前的载入操作不会被重新排在该调用之后；在rmb()之后的载入操作不会被重新排列在该调用之前。

 
 在<System.h(include/asm-i386)>中
#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)

 
 在<Alternative.h(include/asm-i386)>中
/*
 * Alternative instructions for different CPU types or capabilities.
 *
 * This allows to use optimized instructions even on generic binary
 * kernels.
 *
 * length of oldinstr must be longer or equal the length of newinstr
 * It can be padded with nops as needed.
 *
 * For non barrier like inlines please define new variants
 * without volatile and memory clobber.
 */
#define alternative(oldinstr, newinstr, feature)    / 
    asm volatile ("661:/n/t" oldinstr "/n662:/n"             /
              ".section .altinstructions,/"a/"/n"            /
              "  .align 8/n"                       /
              "  .quad 661b/n"            /* label */          /
              "  .quad 663f/n"        /* new instruction */ /
              "  .byte %c0/n"             /* feature bit */    /
              "  .byte 662b-661b/n"       /* sourcelen */      /
              "  .byte 664f-663f/n"       /* replacementlen */ /
              ".previous/n"                 /
              ".section .altinstr_replacement,/"ax/"/n"     /
              "663:/n/t" newinstr "/n664:/n"   /* replacement */ /
              ".previous" :: "i" (feature) : "memory")
 

wmb()方法提供了一个“写”内存屏障。该函数与rmb()类型，区别仅仅是它是针对存储而非载入。它确保跨越屏障的存储不发生重新排序。如果一个体系结构不执行打乱存储（比如Intel x86芯片就不会），那么wmb()就什么也补做。

 
 /*
 * Force strict CPU ordering.
 * And yes, this is required on UP too when we're talking
 * to devices.
 *
 * For now, "wmb()" doesn't actually do anything, as all
 * Intel CPU's follow what Intel calls a *Processor Order*,
 * in which all writes are seen in the program order even
 * outside the CPU.
 *
 * I expect future Intel CPU's to have a weaker ordering,
 * but I'd also expect them to finally get their act together
 * and add some real memory barriers if so.
 *
 * Some non intel clones support out of order store. wmb() ceases to be a
 * nop for these.
 */
 

 
 #ifdef CONFIG_X86_OOSTORE 
/* Actually there are no OOO store capable CPUs for now that do SSE, 
   but make it already an possibility. */
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) 
#else 
#define wmb()   __asm__ __volatile__ ("": : :"memory") 
#endif
 

rmb()和wmb()方法相当于指令，它们告诉处理器在继续执行前提交所有尚未处理的载入或存储指令。

mb()方法既提供了读屏障也提供了写屏障。载入和存储动作都不会跨越屏障重新排序。这是因为一条单独的指令(通常和rmb()使用同一个指令)既可以提供载入屏障，也可以提供存储屏障。

 
 /* 
 * Actually only lfence would be needed for mb() because all stores done 
 * by the kernel should be already ordered. But keep a full barrier for now. 
 */
#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
 

read_barrier_depends()是rmb()的变种，它提供一个读屏障，但是仅仅是针对后续读操作锁依赖的那些载入。因为屏障后的读操作依赖于屏障前的读操作，因此，该屏障确保屏障前的读操作在屏障后的读操作之前完成。基本上说，该函数设置一个读屏障，如rmb()，但是只真对特定的读---也就是那些相互依赖的读操作。

 
 /**
 * read_barrier_depends - Flush all pending reads that subsequents reads
 * depend on.
 *
 * No data-dependent reads from memory-like regions are ever reordered
 * over this barrier.  All reads preceding this primitive are guaranteed
 * to access memory (but not necessarily other CPUs' caches) before any
 * reads following this primitive that depend on the data return by
 * any of the preceding reads.  This primitive is much lighter weight than
 * rmb() on most CPUs, and is never heavier weight than is
 * rmb().
 *
 * These ordering constraints are respected by both the local CPU
 * and the compiler.
 *
 * Ordering is not guaranteed by anything other than these primitives,
 * not even by data dependencies.  See the documentation for
 * memory_barrier() for examples and URLs to more information.
 *
 * For example, the following code would force ordering (the initial
 * value of "a" is zero, "b" is one, and "p" is "&a"):
 *
 * <programlisting>
 *  CPU 0               CPU 1
 *
 *  b = 2;
 *  memory_barrier();
 *  p = &b;             q = p;
 *                  read_barrier_depends();
 *                  d = *q;
 * </programlisting>
 *
 * because the read of "*q" depends on the read of "p" and these
 * two reads are separated by a read_barrier_depends().  However,
 * the following code, with the same initial values for "a" and "b":
 *
 * <programlisting>
 *  CPU 0               CPU 1
 *
 *  a = 2;
 *  memory_barrier();
 *  b = 3;              y = b;
 *                  read_barrier_depends();
 *                  x = a;
 * </programlisting>
 *
 * does not enforce ordering, since there is no data dependency between
 * the read of "a" and the read of "b".  Therefore, on some CPUs, such
 * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
 * in cases like this where there are no data dependencies.
 **/
#define read_barrier_depends()  do { } while(0)
 

第2个<programlisting>主要是为了说明read_barrier_depends()用于数据依赖的读操作，由于y和x不是数据依赖的，因为没有成功设置读屏障，导致x=a在y=b之前运行，于是a可能还是0的时候就被赋给x了。此时，对于没有数据依赖的读操作应该使用rmb()来提供读屏障。

宏smp_rmb()、smp_wbm()、smp_mb()和smp_read_barrier_depends()提供了一个有用的优化。在SMP 内核中，它们被定义成常用的内存屏障，而在单处理器内核中，它们被定义成编译器的屏障。

#ifdef CONFIG_SMP    //在SMP 内核中，它们被定义成常用的内存屏障
#define smp_mb()    mb()
#define smp_rmb()   rmb()
#define smp_wmb()   wmb()
#define smp_read_barrier_depends()  read_barrier_depends()
#define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
#else    //在单处理器内核中，它们被定义成编译器的屏障
#define smp_mb()    barrier()
#define smp_rmb()   barrier()
#define smp_wmb()   barrier()
#define smp_read_barrier_depends()  do { } while(0)
#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif

barrier()方法可以防止编译器跨越屏障对载入或存储操作进行优化。编译器不会重新组织存储或载入操作而防止改变C代码的效果和现有数据的依赖关系。但是，它不知道当前上下文之外会发生什么事。前面讨论的内存屏障可以完成编译器屏障的功能，但是后者比前者轻量得多。实际上，编译器屏障机会是空闲的，因为它只是防止编译器可能重排指令。

 
 在<Compiler.h(include/linux)>中
/* Optimization barrier */
#ifndef barrier 
# define barrier() __memory_barrier() 
#endif 
 

为最坏的情况（即排序能力最弱的处理器）使用恰当的内存屏蔽，这样代码才能在编译时执行针对体系结构的优化。

qinzhonghello

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录