并发与多线程基础之线程之间共享数据

最新推荐文章于 2024-07-08 16:38:20 发布

12th

最新推荐文章于 2024-07-08 16:38:20 发布

阅读量1.5k

点赞数

分类专栏：进程与多线程

本文链接：https://blog.csdn.net/wymtqq/article/details/80039425

版权

进程与多线程专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1、共享数据带来什么问题？

A、条件竞争：并发中竞争条件的形成，取决于一个以上线程的相对执行顺序，每个线程都抢着完成自己的任务。大多数情况下，即使改变执行顺序，也是良性竞争，其结果可以接受。例如，有两个线程同时向一个处理队列中添加任务，因为系统提供的不变量保持不变，所以谁先谁后都不会有什么影响。当不变量遭到破坏时，才会产生条件竞争，比如双向链表的例子。并发中对数据的条件竞争通常表示为恶性条件竞争。

B、避免恶性条件竞争：这里提供一些方法来解决恶性条件竞争，最简单的办法就是对数据结构采用某种保护机制，
确保只有进行修改的线程才能看到不变量被破坏时的中间状态。从其他访问线程的角度来
看，修改不是已经完成了，就是还没开始。

2、使用互斥量保护共享数据

A、C++中通过实例化 std::mutex 创建互斥量，通过调用成员函数lock()进行上锁，unlock()进行解锁。不过，不推荐实践中直接去调用成员函数，因为调用成员函数就意味着，必须记住在每个函数出口都要去调用unlock()，也包括异常的情况。C++标准库为互斥量提供了一个RAII语法的模板类 std::lock_guard ，其会在构造的时候提供已锁的互斥量，并在析构的时候进行解锁，从而保证了一个已锁的互斥量总是会被正确的解锁。

#include <list>
#include <mutex>
#include <algorithm>
std::list<int> some_list; // 1
std::mutex some_mutex; // 2
void add_to_list(int new_value)
{
    std::lock_guard<std::mutex> guard(some_mutex); // 3
    some_list.push_back(new_value);
}
bool list_contains(int value_to_find)
{
    std::lock_guard<std::mutex> guard(some_mutex); // 4
    return std::find(some_list.begin(),some_list.end(),value_to_find) !=
                    some_list.end();
}

全局变量①，这个全局变量被一个全局的互斥量保护②。add_to_list()③和list_contains()④函数中使用 std::lock_guard<std::mutex> ，使得这两个函数中对数据的访问是互斥的：list_contains()不可能看到正在被add_to_list()修改的列表。

B、使用互斥量来保护数据，并不是仅仅在每一个成员函数中都加入一个 std::lock_guard 对象那么简单；一个迷失的指针或引用，将会让这种保护形同虚设。

class some_data
{
	int a;
	std::string b;
	public:
	void do_something();
};
class data_wrapper
{
private:
	some_data data;
	std::mutex m;
public:
	template<typename Function>
	void process_data(Function func)
	{
		std::lock_guard<std::mutex> l(m);
		func(data); // 1 传递“保护”数据给用户函数
	}
};
some_data* unprotected;
void malicious_function(some_data& protected_data)
{
	unprotected=&protected_data;
}
data_wrapper x;
void foo()
{
	x.process_data(malicious_function); // 2 传递一个恶意函数
	unprotected->do_something(); // 3 在无保护的情况下访问保护数据
}

例子中process_data看起来没有任何问题， std::lock_guard 对数据做了很好的保护，但调用用户提供的函数func①，就意味着foo能够绕过保护机制将函数 malicious_function 传递进去②，在没有锁定互斥量的情况下调用 do_something() 。可能使得我们想要保护的数据遭到破坏。

C、发现接口内在的条件竞争：下面例子使用vector实现了一个栈。两个线程轮流从中弹出元素。

#include <iostream>  
#include <thread>  
#include <mutex>  
#include <string>  
#include <vector>  
  
std::mutex myMutex;  
  
class Stack  
{  
public:  
    Stack() {};  
    ~Stack() {};  
    void pop();  
    int top() { return data.back(); }  
    void push(int);  
    void print();  
    int getSize() { return data.size(); }  
private:  
    std::vector<int> data;  
};  
  
void Stack::pop()  
{  
    std::lock_guard<std::mutex> guard(myMutex);  
    data.erase(data.end()-1);  
}  
  
void Stack::push(int n)  
{  
    std::lock_guard<std::mutex> guard(myMutex);  
    data.push_back(n);  
}  
  
void Stack::print()  
{  
    std::cout << "initial Stack : " ;  
    for(int item : data)  
        std::cout << item << " ";  
    std::cout << std::endl;  
}  
  
void process(int val, std::string s)  
{  
    std::lock_guard<std::mutex> guard(myMutex);  
    std::cout << s << " : " << val << std::endl;  
}  
  
void thread_function(Stack& st, std::string s)  
{  
    int val = st.top();  
    st.pop();  
    process(val, s);  
}  
  
int main()  
{  
    Stack st;  
    for (int i = 0; i < 10; i++)    
        st.push(i);  
  
    st.print();  
  
    while(true) {  
        if(st.getSize() > 0) {  
            std::thread t1(&thread_function, std::ref(st), std::string("thread1"));  
            t1.join();  
        }  
        else  
            break;  
        if(st.getSize() > 0) {  
            std::thread t2(&thread_function, std::ref(st), std::string("thread2"));  
            t2.join();  
        }  
        else  
            break;  
    }  
  
    return 0;  
}

运行后的结果之一：
initial Stack : 0 1 2 3 4 5 6 7 8 9
thread1 : 9
thread2 : 8
thread1 : 7
thread2 : 6
thread1 : 5
thread2 : 4
thread1 : 3
thread2 : 2
thread1 : 1
thread2 : 0

看上去这段代码是线程安全的。事实上并非如此。仍然有资源竞争存在，取决于执行的顺序。如下所示：

元素"6"可能被执行两次，且元素"5"被跳过了。
尽管从上面的运行结果看是正确的，但是代码中仍然存在可能触发资源竞争的条件。换言之，这段代码不是线程安全的。
一种解决方法是将函数top()与pop()合并到一个mutex下面：

int stack::pop()  
{  
    lock_guard<mutex> guard(myMutex);  
    int val = data.back();  
    data.erase(data.end()-1);  
    return val;  
}  
  
  
void thread_function(stack& st, string s)  
{  
    int val = st.pop();  
    process(val, s);  
}

削减接口可以获得最大程度的安全,甚至限制对栈的一些操作。栈是不能直接赋值的，因为赋值操作已经删除了，并且这里没有swap()函数。栈可以拷贝的，假设栈中的元素可以拷贝。当栈为空时，pop()函数会抛出一个empty_stack异常，所以在empty()函数被调用后，其他部件还能正常工作。如选项3描述的那样，使用 std::shared_ptr 可以避免内存分配管理的问题，并避免多次使用new和delete操作。堆栈中的五个操作，现在就剩下三个：push(), pop()和empty()(这里empty()都有些多余)。简化接口更有利于数据控制，可以保证互斥量将一个操作完全锁住。下面的代码将展示一个简单的实现——封装 std::stack<> 的线程安全堆栈。

#include <exception>
#include <memory>
#include <mutex>
#include <stack>
struct empty_stack: std::exception
{
	const char* what() const throw() 
	{
		return "empty stack!";
	};
};
template<typename T>
class threadsafe_stack
{
private:
	std::stack<T> data;
	mutable std::mutex m;
public:
	threadsafe_stack()
		: data(std::stack<T>()){}
	threadsafe_stack(const threadsafe_stack& other)
	{
		std::lock_guard<std::mutex> lock(other.m);
		data = other.data; // 1 在构造函数体中的执行拷贝
	}
	threadsafe_stack& operator=(const threadsafe_stack&) = delete;

	void push(T new_value)
	{
		std::lock_guard<std::mutex> lock(m);
		data.push(new_value);
	}
	std::shared_ptr<T> pop()
	{
		std::lock_guard<std::mutex> lock(m);
		
		if(data.empty()) throw empty_stack(); // 在调用pop前，检查栈是否为空
		
		std::shared_ptr<T> const res(std::make_shared<T>(data.top())); // 在修改堆栈前，分配出返回值
		data.pop();
		return res;
	}
	void pop(T& value)
	{
		std::lock_guard<std::mutex> lock(m);
		if(data.empty()) throw empty_stack();
		value=data.top();
		data.pop();
	}
	bool empty() const
	{
		std::lock_guard<std::mutex> lock(m);
		return data.empty();
	}
};

堆栈可以拷贝——拷贝构造函数对互斥量上锁，再拷贝堆栈。构造函数体中①的拷贝使用互斥量来确保复制结果的正确性，这样的方式比成员初始化列表好。

3、保护共享数据的替代设施

A、保护共享数据的初始化过程：假设有一个共享数据，初始化构建代价很昂贵，可能它会打开一个数据库连接，或者会分配出很多内存。延迟初始化时一个优化代码的方法，在使用的时候去判断其是否已经初始化，然后再决定使用。

一般情况下：

std::shared_ptr<some_resource> resource_ptr;
std::mutex resource_mutex;
void foo()
{
    std::unique_lock<std::mutex> lk(resource_mutex); // 所有线程在此序列化
    if(!resource_ptr)
    {
        resource_ptr.reset(new some_resource); // 只有初始化过程需要保护
    }
    lk.unlock();
    resource_ptr->do_something();
}

这段代码相当常见了，也足够表现出没必要的线程化问题，很多人能想出更好的一些的办法来做这件事，包括声名狼藉的双重检查锁模式：

void undefined_behaviour_with_double_checked_locking()
{
    if(!resource_ptr) // 1
    {
        std::lock_guard<std::mutex> lk(resource_mutex);
        if(!resource_ptr) // 2
        {
            resource_ptr.reset(new some_resource); // 3
        }
    }
    resource_ptr->do_something(); // 4
}

指针第一次读取数据不需要获取锁①，并且只有在指针为NULL时才需要获取锁。然后，当获取锁之后，指针会被再次检查一遍② (这就是双重检查的部分)，避免另一的线程在第一次检查后再做初始化，并且让当前线程获取锁。

这个模式为什么声名狼藉呢？因为这里有潜在的条件竞争，未被锁保护的读取操作①没有与其他线程里被锁保护的写入操作③进行同步。因此就会产生条件竞争，这个条件竞争不仅覆盖指针本身，还会影响到其指向的对象；即使一个线程知道另一个线程完成对指针进行写入，它可能没有看到新创建的some_resource实例，然后调用do_something()④后，得到不正确的结果。这个例子是在一种典型的条件竞争——数据竞争， C++ 标准中这就会被指定为“未定义行为”。这种竞争肯定是可以避免的。

C++标准委员会也认为条件竞争的处理很重要，所以C++标准库提供了 std::once_flag 和 std::call_once 来处理这种情况。比起锁住互斥量，并显式的检查指针，每个线程只需要使用 std::call_once ，在 std::call_once 的结束时，就能安全的知道指针已经被其他的线程初始化了。

D、使用 std::call_once 作为类成员的延迟初始化(线程安全)

class X
{
private:
	connection_info connection_details;
	connection_handle connection;
	std::once_flag connection_init_flag;
	void open_connection()
	{
	   connection=connection_manager.open(connection_details);
	}
public:
	X(connection_info const& connection_details_):
	connection_details(connection_details_)
	{}
	void send_data(data_packet const& data) // 1
	{
		std::call_once(connection_init_flag,&X::open_connection,this);// 2
		connection.send_data(data);
	}
	data_packet receive_data() // 3
	{
		std::call_once(connection_init_flag,&X::open_connection,this);// 2
		return connection.receive_data();
	}
};

例子中第一个调用send_data()①或receive_data()③的线程完成初始化过程。使用成员函数open_connection()去初始化数据，也需要将this指针传进去。和其在在标准库中的函数一样，其接受可调用对象，比如 std::thread 的构造函数和 std::bind() ，通过向 std::call_once() ②传递一个额外的参数来完成这个操作。
值得注意的是， std::mutex 和 std::one_flag 的实例就不能拷贝和移动，所以当你使用它们作为类成员函数，如果你需要用到他们，你就得显示定义这些特殊的成员函数。