线程 invalid use of void expression_SEBR:多线程内存回收方案（0）

最新推荐文章于 2024-07-10 07:42:09 发布

rossdawson

最新推荐文章于 2024-07-10 07:42:09 发布

阅读量359

点赞数

本文链接：https://blog.csdn.net/weixin_42469649/article/details/113371819

版权

在此介绍我自己的内存回收方案(SEBR)，它使用了C++17，作为并发环境下的一种Safe Memory Reclamation，它相对于经典的 Epoch based reclamation(5.2.3)

和其他具体的实现方案有些明显的差别。

实现原理：

怎样理解 Epoch based reclaimation？www.zhihu.com

我把它称为Scalable Epoch Based Reclamation(SEBR)。
根据缩写，它也可以被认为是Small Epoch Based Reclamation，这个名字暗示了它的核心idea，回收那些最小的epoch以下的内存。
它还可以被认为是Self-Management Epoch Based Reclamation，基于它采用了独特的设计，实现了[线程/并发数据结构]两者整个生命周期的自我管理，对算法本身的入侵性极小。

技术背景：

随着计算机CPU的增加，多线程编程已经成为开发者必备的技能之一，而为了提升系统的可伸缩性，开发出更好的并发基础数据结构是个重要的方向。开发并发数据结构至少包含两方面的困难，一个是实现与单线程情况下相同或者近似的语义(依赖于它是个并发队列、栈、哈希表、跳跃表或其他数据结构)，这一部分是算法本身特有的任务，另一方面，必须要考虑内存回收(retire/reclaim)-很难想象一个并发数据结构不需要删除操作-这一点是几乎每个并发数据结构都需要考虑的。尤其是C/C++这类手工回收内存的编程语言，对于每个实现良好的并发结构，你都得花相当大的代价去给它实现内存回收方案，而且这种方案是特定于这个数据结构的，如此一来就会耗费大量精力(设计、实现、测试....)，有没有办法抽象出一套轻量级的方案，统一地处理内存回收问题呢？我们只要确保这套方案是正确的，那么只要采用该方案实现自己的并发数据结构，就可以大大加强内存回收的安全性呢？

SEBR致力于达成这个目标

技术难点：为何多线程下内存回收如此困难？

抽象一下这个问题：

假设现在有一个类型T，地址变量address: 0x773240c4，我们有多个线程在访问它（线程A/B/C），而address代表的变量也不是一直不变的，比如线程D会把0x773240d4写入，那么之后A/B/C再去读的话就只能读到0x773240d4这个地址指向的数据了，再然后假如写入0x773240e4....那么我们其实总共会有V1(0x773240c4), V2(0x773240d4), V3(0x773240e4)，三个内存数据，但是随着对address的写入次数增多，V的版本数据总量会变大，当然，我们知道线程A/B/C只会读取最新的数据，这里有个迫切的需求是回收已不再使用的版本数据，怎么回收？(因为就算你对address写入了新的地址，其他线程可能还在读取老的数据)。

一种明显的方案是读写用同一个mutex，但是互斥锁会导致线程之间的依赖，以及系统的context switch，这样会降低系统的scalability。有没有伸缩性更好的方案？

看一个具体的例子：

并发队列: MS Lock Free Queue

enqueue通过首先原子地比较并修改Tail指针的next字段，然后再比较并修改Tail指针本身来实现入队操作，这里本身好像没什么问题，我们再来看dequeue操作：

dequeue首先确保队列中有数据，然后推进head指针来实现出队操作。

这个例子有个问题：

一：D19

free(head.ptr)

这里释放head.ptr不是个安全操作，假如线程A通过D13修改了head，接着释放head.ptr，但是线程B跑得比较慢，延迟后在D12读取value，这里就是个use-after-free，也可能出现于E6/D4行。

为了验证这个问题，我以尽量接近原文的方式实现了这个并发队列(C++)：

#include <cassert>
#include <atomic>
#include <functional>
#include <iostream>
#include <thread>
#include <vector>

template <typename T>
class ms_queue
{
struct Node {
    Node() : data(), next(nullptr) {}
    Node(const T& data) : data(data), next(nullptr) {}
    T data;
    std::atomic<Node*> next;
};

public:
    ms_queue() : Head(new Node()), Tail(Head.load()) { }

    ~ms_queue() {
        Node* end = Tail.load();
        Node* node = Head.load();
        for(;;) {
            Node* next = node->next.load();
            delete node;
            if (node == end) {
                return ;
            }
            node = next;
        }
    }

    void push(const T& data) {
        Node* node = new Node(data);
        Node* tail = nullptr;
        for (;;) {
            tail = Tail.load();
            Node* next = tail->next.load();
            if (tail == Tail.load()) {
                if (next == nullptr) {
                    if (tail->next.compare_exchange_strong(next, node)) {
                        break;
                    }
                } else {
                    Tail.compare_exchange_strong(tail, next);
                }
            }
        }
        Tail.compare_exchange_strong(tail, node);
    }

    bool pop(T* ptr) {
        Node* head = nullptr;
        Node* next = nullptr;
        for (;;) {
            head = Head.load();
            Node* tail = Tail.load();
            next = head->next.load();
            if (head == Head.load()) {
                if (head == tail) {
                    if (next == nullptr) {
                        return false;
                    }
                    Tail.compare_exchange_strong(tail, next);
                } else {
                    *ptr = next->data;
                    if (Head.compare_exchange_strong(head, next)) {
                        break;
                    }
                }
            }
        }
        delete head;
        return true;
    }

private:
    std::atomic<Node*> Head;
    std::atomic<Node*> Tail;
};

跑一个例子(ms_queue_error.cpp)：

long n_const;
long nthreads_const;

void test_scalable_queue(int count, int num) {
    ms_queue<int> queue;
    std::vector<std::thread> threads;

    {
        auto beginTime = std::chrono::high_resolution_clock::now();
        for (int i = 0; i < num; ++i) {
            threads.emplace_back([&queue, count, num] () -> void {
                for (int j = 0; j < (count / num); ++j) {
                    queue.push(53211);
                    int value;
                    queue.pop(&value);
                    assert(value == 53211);
                }
            });
        }
        for (std::thread& th : threads) th.join();
        threads.clear();
        auto endTime = std::chrono::high_resolution_clock::now();
        auto elapsedTime = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - beginTime);
        std::cout << "push/pop elapsed time is " << elapsedTime.count() << " milliseconds" << std::endl;
    }
}

int main(int argc, char* argv[]) {
    int times = atoi(argv[1]);
    n_const = atoi(argv[2]);
    nthreads_const = atoi(argv[3]);
    for (int i = 0; i < times; ++i) {
        std::thread thread([]() -> void {
            test_scalable_queue(n_const, nthreads_const);
        });
        thread.join();
    }

    return 0;
}

使用ASAN编译，

clang++ -std=c++17 -fsanitize=address -fno-omit-frame-pointer -Wall -g -O0 -o test testMSQueue.cpp
./test 5 100000 8

果然跳出了：

指向了 39行:

这一行，head地址所代表的内存已经被回收了。所以这是个UAF(use-after-free)。

那怎么办？要不我们就不回收了吧，注释掉delete:

//delete head;

重新编译，再把例子跑一边，这一次ASAN果然也不报内存错误了，但这些内存怎么办？显然是泄漏了。

再重新编译，不用ASAN:

clang++ -std=c++17 -Wall -g -O0 -o test testMSQueue.cpp

用valgrind去执行一遍：

valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose ./test 5 10000 8

果然内存大量泄漏

definitely lost AND indirectly lost

指向35行，Node* node = new Node(data);

那么这部分内存是必须被正确回收的，怎么办？

SEBR为此而生

下面是利用SEBR来实现MSQueue，从而正确回收内存：

#include <cassert>
#include <atomic>
#include <functional>
#include <iostream>
#include <thread>
#include <vector>
#include "sebr_local.hpp"

template <typename T>
class ms_queue : sebr::ConcurrentBridge<ms_queue<T>>
{
struct Node {
    Node() : data(), next(nullptr) {}
    Node(const T& data) : data(data), next(nullptr) {}
    T data;
    std::atomic<Node*> next;
};

class RecLockFreeNode : public sebr::ReclaimBridge<RecLockFreeNode> {
    Node* node;
public:
    RecLockFreeNode(Node* node) : node(node) { }

    void reclaim() {
        delete node;
    }
};

public:
    ms_queue() : sebr::ConcurrentBridge<ms_queue<T>>(), Head(new Node()), Tail(Head.load()) { }

    ~ms_queue() {
        Node* end = Tail.load();
        Node* node = Head.load();
        for(;;) {
            Node* next = node->next.load();
            delete node; // maybe change to "Node* local = node; delete local;" for VISUAL STUDIO.
            if (node == end) {
                return ;
            }
            node = next;
        }
    }

    void push(const T& data) {
        Node* node = new Node(data);
        Node* tail = nullptr;
        sebr::Pin pin(this);
        for (;;) {
            tail = Tail.load();
            Node* next = tail->next.load();
            if (tail == Tail.load()) {
                if (next == nullptr) {
                    if (tail->next.compare_exchange_strong(next, node)) {
                        break;
                    }
                } else {
                    Tail.compare_exchange_strong(tail, next);
                }
            }
        }
        Tail.compare_exchange_strong(tail, node);
    }

    bool pop(T* ptr) {
        Node* head = nullptr;
        Node* next = nullptr;
        sebr::Pin pin(this);
        for (;;) {
            head = Head.load();
            Node* tail = Tail.load();
            next = head->next.load();
            if (head == Head.load()) {
                if (head == tail) {
                    if (next == nullptr) {
                        return false;
                    }
                    Tail.compare_exchange_strong(tail, next);
                } else {
                    *ptr = next->data;
                    if (Head.compare_exchange_strong(head, next)) {
                        break;
                    }
                }
            }
        }
        pin.retire<RecLockFreeNode> (sizeof(Node), head);
        return true;
    }

private:
    std::atomic<Node*> Head;
    std::atomic<Node*> Tail;
};

使用这个ms_queue跑一下上面的ASAN跟Valgrind，内存操作是正确的，同时内存也被干净地释放了。

此处still reachable跟suppressed跟例子无关，请忽略。

那么相对于原本的ms_queue，我们究竟做了什么呢？答案是下图：

对比可以看出入侵性非常小。

SEBR 最大的设计目标是Usability(易用性)。

它提供了三个类(抽象):

sebr::ConcurrentBridge<T>，让你的并发数据结构(T)继承它。(CRTP)
sebr::ReclaimBridge<T>，用于构造你自己的每一种内存回收方式，需要跟数据结构的具体实现，以及性能方面的考量来配合，通过实现reclaim来具体释放内存。T是你自己构造的类型。(CRTP)
sebr::Pin，当把某块内存抹去(erase)之后，通过调用retire方法来预定释放。

为了保持简单性，目前它仅提供了一次调用(retire):

retire调用在整个epoch-based工作方式中，处于把一个地址抹去之后(意味着之后不会再被读到)，把一个地址的内存释放(真正的释放)之前。

后面会进一步具体介绍SEBR的使用方式，设计目标，实现原理，扩展方向，性能优化等。

rossdawson

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
线程 invalid use of void expression_SEBR:多线程内存回收方案（0）

在此介绍我自己的内存回收方案(SEBR)，它使用了C++17，作为并发环境下的一种Safe Memory Reclamation，它相对于经典的 Epoch based reclamation(5.2.3)和其他具体的实现方案有些明显的差别。实现原理：怎样理解 Epoch based reclaimation？www.zhihu.com我把它称为Scalable Epoch Based Recl...
复制链接

扫一扫