C++多线程排序通用模板

最新推荐文章于 2022-03-20 20:22:56 发布

置顶 Skuaka

最新推荐文章于 2022-03-20 20:22:56 发布

阅读量1.5k

点赞数 1

分类专栏： C_C++ Algorithm

本文链接：https://blog.csdn.net/Skuaka/article/details/89192991

版权

C_C++ 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

Algorithm

1 篇文章 0 订阅

订阅专栏

C++多线程多路归并排序函数模板

设计

实现方式

各个线程对分配给它的数据分块进行快速排序(std::sort)。
主线程排序完分配给自己的数据分块后，需要等待(std::future,std::promise)各个线程完成。
主线程多路归并排序(需要自己实现，标准库只有二路归并)。

实现注意

本实现大量使用C++11特性。
c语言的qsort内部用到了静态变量，所以不是线程安全的。可以通过在主线程使用一次qsort解决。具体自行百度或看这篇。而这里使用std::sort则没有这个问题。

函数参数

指向数据首地址的指针
数据长度
指定需要创建的新线程数量

适用性

只支持数值类型，读者可自行扩展
需要支持随机存储

数据分块大小

设数据长度(data_len)为 D，新线程数量(thread_num)为 T，一个分块大小(chunk_size)为 C，最后一个分块的大小(last_chunk_size)为 LC。

直观地，如果工作线程数量可以整除数据长度，那么商就是块大小。

LC = C = D/(T+1)

如果不能整除的时候：

C = D/(T+1)+1
=> (C-1) = D/(T+1)
=> (C-1)*(T+1) + K = D, K ∈ (0,T+1)

LC = D - C*T
   = (C-1)*(T+1) + K - C*T
   = C-(T+1) + K
   ∈ (C-(T+1), C)

因为 T 非常小，所以最后一块的大小差别不大，本实现就由主线程来处理最后一块。

实现代码

// multi_threaded_sort.h
#ifndef MULTI_THREADED_SORT_H
#define MULTI_THREADED_SORT_H

#include <type_traits>
#include <limits>
#include <algorithm>
#include <thread>
#include <future>

template <typename IntegerType>
void multi_threaded_sort(IntegerType* in_data, size_t data_len, size_t thread_num)
{
    // compile time type check
    static_assert(std::is_integral<IntegerType>::value,
                  "Data type must be integer");

    // under such conditions, multi-thread makes no sense
    // call std::sort directly
    if(data_len <= 1 || thread_num == 0
       || data_len < (thread_num+1)*(thread_num+1))
    {
        std::sort(in_data, in_data+data_len);
        return;
    }

    /* one thread sort one chunk
     * main thread sort the last chunk */
    size_t chunk_size = data_len/(thread_num+1);
    if(data_len%(thread_num+1) != 0)
        ++chunk_size;

    // for threads synchronize
    auto sort_promise = new std::promise<void>[thread_num];
    auto sort_future = new std::future<void>[thread_num];
    for(int i=0; i<thread_num; ++i)
        sort_future[i] = sort_promise[i].get_future();

    // create threads
    for(size_t i=0; i<thread_num; ++i){
        std::thread th([=]{
            std::sort(in_data + i*chunk_size, in_data + (i+1)*chunk_size);
            sort_promise[i].set_value();
        });
        th.detach();
    }

    // sort the last chunk
    std::sort(in_data + chunk_size*thread_num, in_data + data_len);

    // before wait and block, do things not based on data
    auto out_data = new IntegerType[data_len];
    auto index = new size_t[thread_num + 1];
    for (int i=0; i<thread_num + 1; ++i)
        index[i] = i * chunk_size;

    // wait for all threads
    for(size_t i=0; i<thread_num; ++i)
        sort_future[i].wait();

    delete[] sort_future;
    delete[] sort_promise;

    // do merge sort
    for(size_t i = 0; i < data_len; ++i)
    {
        IntegerType min_index;
        IntegerType min_num = std::numeric_limits<IntegerType>::max();

        // traverse every chunk and find the minimum
        for(size_t j=0; j<thread_num; ++j)
        {
            if((index[j] < (j+1)*chunk_size)
               && (in_data[index[j]] < min_num))
            {
                min_index = j;
                min_num = in_data[index[j]];
            }
        }
        if(index[thread_num] < data_len
           && (in_data[index[thread_num]] < min_num))
        {
            min_index = thread_num;
        }

        out_data[i] = in_data[index[min_index]];
        index[min_index]++;
    }

    std::copy(out_data, out_data + data_len, in_data);
    delete[] out_data;
}

#endif //MULTI_THREADED_SORT_H

测试

测试代码

// main.cpp
#include "multi_threaded_sort.h"
#include <iostream>
#include <chrono>
#include <random>
using namespace std;

int main(int argc, char *argv[]) {
    {
        cout << "this example check the correctness:";
        short data[] = {2, 5, 5, 3, 93, 43, 3, 0, -3, 43};
        size_t N = sizeof(data) / sizeof(short);

        cout << "\ninput " << N << " data: ";
        for (int i = 0; i < N; ++i) cout << data[i] << ' ';
        multi_threaded_sort(data, N, 1);
        cout << "\n\tsort with 2 threads: ";
        for (int i = 0; i < N; ++i) cout << data[i] << ' ';

        random_shuffle(data, data + N);// before c++17

        cout << "\nafter shuffle: ";
        for (int i = 0; i < N; ++i) cout << data[i] << ' ';
        multi_threaded_sort(data, N, 2);
        cout << "\n\tsort with 3 threads: ";
        for (int i = 0; i < N; ++i) cout << data[i] << ' ';
    }
    // -----------------------------------------------------
    {
        const size_t N = 654321;   // numbers to generate
        const size_t T = 6;         // threads to test
        cout << "\n\nthis example check the efficiency:\n"
             << "randomly generate " << N
             << " natural number and sort them...\n";

        short random_data[T][N];
        random_device rd;
        default_random_engine rng{rd()};
        std::uniform_int_distribution<short> dis;
        for (size_t i = 0; i < N; ++i)
            random_data[0][i] = dis(rng);
        for (size_t i = 1; i < T; ++i)
            copy(random_data[0], random_data[0] + N, random_data[i]);

        chrono::time_point<chrono::high_resolution_clock> start_t[T], end_t[T];
        chrono::duration<double, std::milli> elapsed[T];
        for (size_t i = 0; i < T; ++i) {
            start_t[i] = chrono::high_resolution_clock::now();
            if (i)
                multi_threaded_sort(random_data[1], N, i + 1);
            else
                sort(random_data[0], random_data[0] + N);
            end_t[i] = chrono::high_resolution_clock::now();
            elapsed[i] = end_t[i] - start_t[i];
        }

        cout << "Use std::sort() cost: " << elapsed[0].count() << " ms\n";
        for (size_t i = 1; i < T; ++i)
            cout << "Add " << i << " threads   cost: " << elapsed[i].count() << " ms\n";
    }
    return 0;
}

输出

this example check the correctness:
input 10 data: 2 5 5 3 93 43 3 0 -3 43 
	sort with 2 threads: -3 0 2 3 3 5 5 43 43 93 
after shuffle: 5 -3 3 5 43 43 3 0 2 93 
	sort with 3 threads: -3 0 2 3 3 5 5 43 43 93 

this example check the efficiency:
randomly generate 654321 natural number and sort them...
Use std::sort() cost: 36.1254 ms
Add 1 threads   cost: 25.5484 ms
Add 2 threads   cost: 9.9246 ms
Add 3 threads   cost: 10.2634 ms
Add 4 threads   cost: 11.2967 ms
Add 5 threads   cost: 14.7214 ms

后记

因为线程本身创建和销毁需要时间，所以在数据量较少的情况下使用多线程肯定是不值得的，反而会变慢。
因为CPU核数是有限的，所以可以并发执行线程数肯定不会很高，所以创建过多线程也是没有用的，线程数达到某个数量(一般很小)后排序速度随线程数量增加而缓慢下降。

对于上面两点，读者可以修改测试代码的 N 和 T 自己进行测试。

笔者水平有限，如有错误，还望指出。

结束

Skuaka

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
C++多线程排序通用模板

C++多线程排序函数实现设计实现方式各个线程对分配给它的数据分块进行快速排序(std::sort)。主线程排序完分配给自己的数据分块后，需要等待(std::future,std::promise)各个线程完成。主线程多路归并排序(需要自己实现，标准库只有二路归并)。实现注意本实现大量使用C++11特性。c语言的qsort内部用到了静态变量，所以不是线程安全的。可以通过在主线程...
复制链接

扫一扫

专栏目录