windows C++ 并行编程-编写 parallel_for_each 循环

sului

于 2024-09-15 00:15:00 发布

阅读量769

点赞数 10

分类专栏： windows C++并行编程技术文章标签： c++ 开发语言

本文链接：https://blog.csdn.net/m0_72813396/article/details/141730109

版权

windows C++并行编程技术专栏收录该内容

91 篇文章 1 订阅

订阅专栏

示例1演示如何使用 concurrency::parallel_for_each 算法并行计算 std::array 对象中的质数计数；示例2演示了如何使用 concurrency::parallel_transform 和 concurrency::parallel_reduce 算法以及 concurrency::concurrent_unordered_map 类计数文件中的单词出现次数。

示例1

以下示例两次计算数组中素数的计数。该示例首先使用 std::for_each 算法来连续计算计数。然后，该示例使用 parallel_for_each 算法并行执行相同的任务。示例还控制台输出了进行两种计算所需的时间。

// parallel-count-primes.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <iostream>
#include <algorithm>
#include <array>

using namespace concurrency;
using namespace std;

// Returns the number of milliseconds that it takes to call the passed in function.
template <class Function>
__int64 time_call(Function&& f)
{
    __int64 begin = GetTickCount();
    f();
    return GetTickCount() - begin;
}

// Determines whether the input is a prime.
bool is_prime(int n)
{
    if (n < 2)
    {
        return false;
    }

    for (int i = 2; i < int(std::sqrt(n)) + 1; ++i)
    {
        if (n % i == 0)
        {
            return false;
        }
    }
    return true;
}

int wmain()
{
    // Create an array object that contains 200000 integers.
    array<int, 200000> a;

    // Initialize the array such that a[i] == i.
    int n = 0;
    generate(begin(a), end(a), [&]
        {
            return n++;
        });

    // Use the for_each algorithm to count, serially, the number
    // of prime numbers in the array.
    LONG prime_count = 0L;
    __int64 elapsed = time_call([&]
        {
            for_each(begin(a), end(a), [&](int n)
            {
                if (is_prime(n))
                {
                    ++prime_count;
                }
            });
        });
    
    wcout << L"serial version: " << endl
        << L"found " << prime_count << L" prime numbers" << endl
        << L"took " << elapsed << L" ms" << endl << endl;

    // Use the parallel_for_each algorithm to count, in parallel, the number
    // of prime numbers in the array.
    prime_count = 0L;
    elapsed = time_call([&]
        {
            parallel_for_each(begin(a), end(a), [&](int n)
                {
                    if (is_prime(n))
                    {
                        InterlockedIncrement(&prime_count);
                    }
                });
        });

    wcout << L"parallel version: " << endl
        << L"found " << prime_count << L" prime numbers" << endl
        << L"took " << elapsed << L" ms" << endl << endl;
}

以下是四核的计算机的输出示例。

serial version:
found 17984 prime numbers
took 125 ms

parallel version:
found 17984 prime numbers
took 63 ms

可靠编程

该示例传递给 parallel_for_each 算法的 Lambda 表达式使用 InterlockedIncrement 函数来启用循环的并行迭代以同步递增计数器。如果使用 InterlockedIncrement 等函数来同步对共享资源的访问，可能会在代码中出现性能瓶颈。可使用无锁同步机制（例如 concurrency::combinable 类）来消除对共享资源的同步访问。

示例2

此示例演示了如何使用 concurrency::parallel_transform 和 concurrency::parallel_reduce 算法以及 concurrency::concurrent_unordered_map 类计数文件中的单词出现次数。

映射操作会将一个函数应用于序列中的每个值。化简操作会将一个序列的元素组合为一个值。可以使用 C++ 标准模板库 std::transform 和 std::accumulate 函数来执行映射和化简操作。但是，为了提高许多问题的性能，你可以使用 parallel_transform 算法并行执行映射操作，并使用 parallel_reduce 算法并行执行化简操作。在某些情况下，你可以使用 concurrent_unordered_map 以一步操作执行映射和化简。

以下示例计算了文件中单词出现的次数。它使用 std::vector 来表示两个文件的内容。映射操作计算了每个向量中每个单词出现的次数。化简操作累积了跨这两个向量的字数统计。

// parallel-map-reduce.cpp
// compile with: /EHsc
#include <ppl.h>
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
#include <numeric>
#include <unordered_map>
#include <windows.h>

using namespace concurrency;
using namespace std;

class MapFunc 
{ 
public:
    unordered_map<wstring, size_t> operator()(vector<wstring>& elements) const 
    { 
        unordered_map<wstring, size_t> m;
        for_each(begin(elements), end(elements), [&m](const wstring& elem)
        { 
            m[elem]++;
        });
        return m; 
    }
}; 

struct ReduceFunc : binary_function<unordered_map<wstring, size_t>, 
                    unordered_map<wstring, size_t>, unordered_map<wstring, size_t>>
{
    unordered_map<wstring, size_t> operator() (
        const unordered_map<wstring, size_t>& x, 
        const unordered_map<wstring, size_t>& y) const
    {
        unordered_map<wstring, size_t> ret(x);
        for_each(begin(y), end(y), [&ret](const pair<wstring, size_t>& pr) {
            auto key = pr.first;
            auto val = pr.second;
            ret[key] += val;
        });
        return ret; 
    }
}; 

int wmain()
{ 
    // File 1 
    vector<wstring> v1 {
      L"word1", // 1
      L"word1", // 1
      L"word2",
      L"word3",
      L"word4"
    };

    // File 2 
    vector<wstring> v2 {
      L"word5",
      L"word6",
      L"word7",
      L"word8",
      L"word1" // 3
    };

    vector<vector<wstring>> v { v1, v2 };

    vector<unordered_map<wstring, size_t>> map(v.size()); 

    // The Map operation
    parallel_transform(begin(v), end(v), begin(map), MapFunc()); 

    // The Reduce operation 
    unordered_map<wstring, size_t> result = parallel_reduce(
        begin(map), end(map), unordered_map<wstring, size_t>(), ReduceFunc());

    wcout << L"\"word1\" occurs " << result.at(L"word1") << L" times. " << endl;
} 
/* Output:
   "word1" occurs 3 times.
*/

可靠编程

在此示例中，可以使用在 concurrent_unordered_map.h 中定义的 concurrent_unordered_map 类以一步操作执行映射和化简。

// File 1 
vector<wstring> v1 {
  L"word1", // 1
  L"word1", // 2
  L"word2",
  L"word3",
  L"word4",
};

// File 2 
vector<wstring> v2 {
  L"word5",
  L"word6",
  L"word7",
  L"word8",
  L"word1", // 3
}; 

vector<vector<wstring>> v { v1, v2 };

concurrent_unordered_map<wstring, size_t> result;
for_each(begin(v), end(v), [&result](const vector<wstring>& words) {
    parallel_for_each(begin(words), end(words), [&result](const wstring& word) {
        InterlockedIncrement(&result[word]);
    });
});
            
wcout << L"\"word1\" occurs " << result.at(L"word1") << L" times. " << endl;

/* Output:
   "word1" occurs 3 times.
*/

通常情况下，你只需并行化外部或内部循环。如果你的文件相对较少并且每个文件中包含的单词很多，则可以并行化内部循环。如果你的文件相对较多并且每个文件中包含的单词比较少，则可以并行化外部循环。