学习内存屏障

最新推荐文章于 2023-01-12 14:40:38 发布

make-n

最新推荐文章于 2023-01-12 14:40:38 发布

阅读量183

点赞数

分类专栏： embed 文章标签： mb

原文链接：https://blog.csdn.net/world_hello_100/article/details/50131497

版权

embed 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

看了下面的博客和评论：记录对内存屏障的理解。
https://blog.csdn.net/world_hello_100/article/details/50131497
【1】编译器屏障：优化等级O2，O3时可能改变指令实际执行顺序，引入指令和代码逻辑不符问题。
解决方法1：添加编译器 barrier：

    #define barrier() __asm__ __volatile__("" ::: "memory")

解决方法2：
还可以使用 volatile 这个关键字来避免编译时内存乱序访问（而无法避免后面要说的运行时内存乱序访问）。
在 Linux 内核中，提供了一个宏 ACCESS_ONCE 来避免编译器对连续的 ACCESS_ONCE 实例进行指令重排。ACCESS_ONCE(x)作为左值使用。

    #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
    /******* 分割符 ******/
    ACCESS_ONCE(x) = r;
    ACCESS_ONCE(y) = x;

【2】运行时乱序
在乱序执行时，一个处理器真正执行指令的顺序由可用的输入数据决定，而非程序员编写的顺序。

乱序处理器（Out-of-order processors）处理指令通常有以下几步：
    1，指令获取
    2，指令被分发到指令队列
    3，指令在指令队列中等待，直到输入操作对象可用（一旦输入操作对象可用，指令就可以离开队列，即便更早的指令未被执行）
    4，指令被分配到适当的功能单元并执行
    5，执行结果被放入队列（而不立即写入寄存器堆）
    6，只有所有更早请求执行的指令的执行结果被写入寄存器堆后，指令执行的结果才被写入寄存器堆（执行结果重排序，让执行看起来是有序的）

在单CPU上，指令的获取和结果的回写是有序的，不存在CPU执行指令乱序的问题。但是在多处理器上每个CPU有自己的cache内存，当CPU写操作时，是写到cache,不能保证cache的一致性，就会产生问题，必须通过一个 cache 一致性协议来避免数据不一致，而这个协议通讯的过程就可能导致乱序访问的出现，也就是这里说的运行时内存乱序访问是因多核cache不一致引起的。

实际的应用程序开发中，开发者可能完全不知道 Memory barrier 就可以开发正确的多线程程序，这主要是因为各种同步机制中已经隐含了 Memory barrier（但和实际的 Memory barrier 有细微差别），这就使得不直接使用 Memory barrier 不会存在任何问题。但是如果你希望编写诸如无锁数据结构，那么 Memory barrier 还是很有用的。

Memory barrier 常用场合包括：
    实现同步原语（synchronization primitives）
    实现无锁数据结构（lock-free data structures）
    驱动程序

内存屏障接口

通用 barrier，保证读写操作有序的（屏障前后有读，又有写的操作，保证这两个操作的有序性），mb() 
写操作 barrier，仅保证写操作有序的（屏障前后都是写操作，保证这两个写操作的有序性），wmb() 
读操作 barrier，仅保证读操作有序的（屏障前后都是读操作，保证这两个读操作的有序性），rmb()

分析一下无锁结构：

/**
 * __kfifo_put - puts some data into the FIFO, no locking version
 * @fifo: the fifo to be used.
 * @buffer: the data to be added.
 * @len: the length of the data to be added.
 *
 * This function copies at most @len bytes from the @buffer into
 * the FIFO depending on the free space, and returns the number of
 * bytes copied.
 *
 * Note that with only one concurrent reader and one concurrent
 * writer, you don't need extra locking to use these functions.
 */
unsigned int __kfifo_put(struct kfifo *fifo,
                         const unsigned char *buffer, unsigned int len)
{
    unsigned int l;
    len = min(len, fifo->size - fifo->in + fifo->out);
    
    /** Ensure that we sample the fifo->out index -before- we
     * start putting bytes into the kfifo.*/
    /*这里保证 先读取到正确的fifo->out，计算出正确的len，然后写数据到kfifo，
     如果读取到的kfifo错误，计算出kfifo的 可写空间偏小 */
    smp_mb();
    
    /* first put the data starting from fifo->in to buffer end */
    l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
    memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
    /* then put the rest (if any) at the beginning of the buffer */
    memcpy(fifo->buffer, buffer + l, len - l);
    /** Ensure that we add the bytes to the kfifo -before-
     * we update the fifo->in index. */
	/*这里保证前后的 写操作的有序，先写数据，再更新in index	*/
    smp_wmb();
    fifo->in += len;
    
    return len;
}
EXPORT_SYMBOL(__kfifo_put);
 
/**
 * __kfifo_get - gets some data from the FIFO, no locking version
 * @fifo: the fifo to be used.
 * @buffer: where the data must be copied.
 * @len: the size of the destination buffer.
 *
 * This function copies at most @len bytes from the FIFO into the
 * @buffer and returns the number of copied bytes.
 *
 * Note that with only one concurrent reader and one concurrent
 * writer, you don't need extra locking to use these functions.
 */
unsigned int __kfifo_get(struct kfifo *fifo,
                         unsigned char *buffer, unsigned int len)
{
    unsigned int l;
    len = min(len, fifo->in - fifo->out);
    /** Ensure that we sample the fifo->in index -before- we
     * start removing bytes from the kfifo.*/
    /* 先读取到正确的fifo->in,计算正确的数据长度，然后读取kfifo 的数据，
    保证两个读操作的有序性*/
    smp_rmb();
    /* first get the data from fifo->out until the end of the buffer */
    l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
    memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
 
    /* then get the rest (if any) from the beginning of the buffer */
    memcpy(buffer + l, fifo->buffer, len - l);
    /** Ensure that we remove the bytes from the kfifo -before-
     * we update the fifo->out index.*/
    /*先读到kfifo的数据，然后才写fifo->out index,一个读，一个写操作*/
    smp_mb();
    fifo->out += len;
    return len;
}
EXPORT_SYMBOL(__kfifo_get);

最后这里顺带说一下此实现使用到的一些和本文主题无关的技巧：

1，使用与& 操作来求取环形缓冲区的下标，相比取余操作来求取下标的做法效率要高不少。使用与操作求取下标的前提是环形缓冲区的大小必须是 2 的 N 次方，换而言之就是说环形缓冲区的大小为一个仅有一个 1 的二进制数，那么 index & (size – 1) 则为求取的下标（这不难理解）
2，使用了 in 和 out 两个索引且 in 和 out 是一直递增的（此做法比较巧妙），这样能够避免一些复杂的条件判断（某些实现下，in == out 时还无法区分缓冲区是空还是满）

【疑问】：
in 和 out 是一直递增的，in溢出后归0，out未溢出，计算出
len = min(len, fifo->in - fifo->out);的有效数据会不会出错。

make-n

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
学习内存屏障

看了下面的博客和评论：记录对内存屏障的理解。https://blog.csdn.net/world_hello_100/article/details/50131497【1】编译器屏障：优化等级O2，O3时可能改变指令实际执行顺序，引入指令和代码逻辑不符问题。解决方法1：添加编译器 barrier：解决方法2：还可以使用 volatile 这个关键字来避免编译时内存乱序访问（而无法避免后面要说的运行时内存乱序访问）。在 Linux 内核中，提供了一个宏 ACCESS_ONCE 来避免编译器对连续
复制链接

扫一扫