windows C++-并行编程-并行算法(二)

最新推荐文章于 2024-09-12 00:15:00 发布

sului

最新推荐文章于 2024-09-12 00:15:00 发布

阅读量403

点赞数 8

分类专栏： windows C++并行编程技术文章标签： c++ windows

本文链接：https://blog.csdn.net/m0_72813396/article/details/141580546

版权

windows C++并行编程技术专栏收录该内容

31 篇文章 1 订阅

订阅专栏

parallel_invoke 算法

concurrency::parallel_invoke 算法以并行方式执行一组任务。在每个任务完成之前，它不会返回。如果你有几个独立的任务需要同时执行，则此算法非常有用。

parallel_invoke 算法采用一系列工作函数（Lambda 函数、函数对象或函数指针）作为其参数。 parallel_invoke 算法将进行重载，以采用两到十个参数。你传递给 parallel_invoke 的每个函数都必须采用零参数。

与其他并行算法一样，parallel_invoke 不会按特定顺序执行任务。主题任务并行说明了 parallel_invoke 算法与任务和任务组的关系。

示例

以下示例展示了 parallel_invoke 算法的基本结构。此示例同时对三个局部变量调用 twice 函数，并将结果输出到控制台。

// parallel-invoke-structure.cpp
// compile with: /EHsc
#include <ppl.h>
#include <string>
#include <iostream>

using namespace concurrency;
using namespace std;

// Returns the result of adding a value to itself.
template <typename T>
T twice(const T& t) {
   return t + t;
}

int wmain()
{
   // Define several values.
   int n = 54;
   double d = 5.6;
   wstring s = L"Hello";

   // Call the twice function on each value concurrently.
   parallel_invoke(
      [&n] { n = twice(n); },
      [&d] { d = twice(d); },
      [&s] { s = twice(s); }
   );

   // Print the values to the console.
   wcout << n << L' ' << d << L' ' << s << endl;
}

输出：
108 11.2 HelloHello

parallel_transform 和 parallel_reduce 算法

concurrency::parallel_transform 和 concurrency::parallel_reduce 算法分别是 C++ 标准库算法 std::transform 和 std::accumulate 的并行版本。并发运行时版本的行为类似于 C++ 标准库版本，只不过操作顺序不确定，因为它们是并行执行的。当您使用的集足够大，可从并行处理中获得性能和可扩展性优势时，请使用这些算法。因为这些迭代器会生成稳定的内存地址，所以 parallel_transform 算法和 parallel_reduce 算法仅支持随机访问、双向和向前迭代器。而且这些迭代器必须生成非 const 左值。

parallel_transform 算法

您可以使用 parallel transform 算法执行许多数据并行化操作。例如，可以：

调整图像的亮度，并执行其他图像处理操作；
在两个向量之间求和或取点积，并对向量执行其他数值计算；
执行三维射线跟踪，其中每次迭代引用一个必须呈现的像素；

下面的示例显示用于调用 parallel_transform 算法的基本结构。此示例以两种方法对 std::vector 对象的每个元素求反。第一种方法是使用 lambda 表达式。第二种方法是使用派生自 std::unary_function 的 std::negate。

// basic-parallel-transform.cpp
// compile with: /EHsc
#include <ppl.h>
#include <random>

using namespace concurrency;
using namespace std;

int wmain()
{
    // Create a large vector that contains random integer data.
    vector<int> values(1250000);
    generate(begin(values), end(values), mt19937(42));

    // Create a vector to hold the results.
    // Depending on your requirements, you can also transform the 
    // vector in-place.
    vector<int> results(values.size());

    // Negate each element in parallel.
    parallel_transform(begin(values), end(values), begin(results), [](int n) {
        return -n;
    });

    // Alternatively, use the negate class to perform the operation.
    parallel_transform(begin(values), end(values), begin(values), negate<int>());
}

// 本示例演示 parallel_transform 的基本用法。 
// 由于工作函数不会执行大量工作，因此本示例中不会有显著的性能提升。

parallel_transform 算法有两个重载。第一个重载采用一个输入范围和一个一元函数。该一元函数可以是采用一个自变量的 Lambda 表达式、一个函数对象或从 unary_function 派生的一个类型。第二个重载采用两个输入范围和一个二元函数。该二元函数可以是采用两个自变量的 Lambda 表达式、一个函数对象或从 std::binary_function 派生的一个类型。下面的示例阐释了这些差异。

//
// Demonstrate use of parallel_transform together with a unary function.

// This example uses a lambda expression.
parallel_transform(begin(values), end(values), 
    begin(results), [](int n) { 
        return -n;
    });

// Alternatively, use the negate class:
parallel_transform(begin(values), end(values), 
    begin(results), negate<int>());

//
// Demonstrate use of parallel_transform together with a binary function.

// This example uses a lambda expression.
parallel_transform(begin(values), end(values), begin(results), 
    begin(results), [](int n, int m) {
        return n * m;
    });

// Alternatively, use the multiplies class:
parallel_transform(begin(values), end(values), begin(results), 
    begin(results), multiplies<int>());

// 您为 parallel_transform 的输出提供的迭代器必须与输入迭代器完全重叠或根本不重叠。 
// 如果输入迭代器和输出迭代器部分重叠，则此算法的行为是未指定的。

parallel_reduce 算法

当您具有满足关联属性的操作序列时，parallel_reduce 算法很有用。 (此算法不需要可交换属性。)下面是可以使用 parallel_reduce 执行的一些操作：

将矩阵的序列相乘以生成一个矩阵;
用矩阵序列乘以一个向量来生成一个向量;
计算字符串序列的长度;
将一个元素列表(例如字符串)组合为一个元素;

下面的基本示例演示如何使用 parallel_reduce 算法将一个字符串序列组合为一个字符串。与 parallel_transform 的示例一样，此基本示例中不会有性能提升。

// basic-parallel-reduce.cpp
// compile with: /EHsc
#include <ppl.h>
#include <iostream>
#include <string>
#include <vector>

using namespace concurrency;
using namespace std;

int wmain()
{
  // Create a vector of strings.
  vector<wstring> words{
      L"Lorem ",
      L"ipsum ",
      L"dolor ",
      L"sit ",
      L"amet, ",
      L"consectetur ",
      L"adipiscing ",
      L"elit."};
      
  // Reduce the vector to one string in parallel.
  wcout << parallel_reduce(begin(words), end(words), wstring()) << endl;
}

/* Output:
   Lorem ipsum dolor sit amet, consectetur adipiscing elit.
*/

在许多情况下，你可以将 parallel_reduce 视为 parallel_for_each 算法与 concurrency::combinable 类一起使用的简写形式。

示例：并行执行映射和降低

映射操作会将一个函数应用于序列中的每个值。化简操作会将一个序列的元素组合为一个值。可以使用 C++ 标准模板库 std::transform 和 std::accumulate 函数来执行映射和化简操作。但是，对于许多问题，您可以使用 parallel_transform 算法并行执行映射操作，并使用 parallel_reduce 算法并行执行化简操作。

下面的示例将按串行方式计算质数和所需的时间与按并行方式计算质数和所需的时间进行比较。映射阶段会将非质数值转换为 0，而化简阶段将对这些值求和。

// parallel-map-reduce-sum-of-primes.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <array>
#include <numeric>
#include <iostream>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

// Determines whether the input value is prime.
bool is_prime(int n)
{
   if (n < 2)
      return false;
   for (int i = 2; i < n; ++i)
   {
      if ((n % i) == 0)
         return false;
   }
   return true;
}

int wmain()
{   
   // Create an array object that contains 200000 integers.
   array<int, 200000> a;

   // Initialize the array such that a[i] == i.
   iota(begin(a), end(a), 0);

   int prime_sum;
   __int64 elapsed;

   // Compute the sum of the numbers in the array that are prime.
   elapsed = time_call([&] {
       transform(begin(a), end(a), begin(a), [](int i) { 
         return is_prime(i) ? i : 0; 
      });
      prime_sum = accumulate(begin(a), end(a), 0);
   });   
   wcout << prime_sum << endl;   
   wcout << L"serial time: " << elapsed << L" ms" << endl << endl;

   // Now perform the same task in parallel.
   elapsed = time_call([&] {
      parallel_transform(begin(a), end(a), begin(a), [](int i) { 
         return is_prime(i) ? i : 0; 
      });
      prime_sum = parallel_reduce(begin(a), end(a), 0);
   });
   wcout << prime_sum << endl;
   wcout << L"parallel time: " << elapsed << L" ms" << endl << endl;
}
/* Sample output:
   1709600813
   serial time: 7406 ms
   
   1709600813
   parallel time: 1969 ms
*/

sului

关注

8
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
windows C++-并行编程-并行算法(二)

concurrency::parallel_transform 和 concurrency::parallel_reduce 算法分别是 C++ 标准库算法 std::transform 和 std::accumulate 的并行版本。第二种方法是使用派生自 std::unary_function 的 std::negate。在许多情况下，你可以将 parallel_reduce 视为 parallel_for_each 算法与 concurrency::combinable 类一起使用的简写形式。
复制链接

扫一扫

专栏目录