AMGCL--vexcl

最新推荐文章于 2024-08-09 07:07:30 发布

一步一个脚印的屌丝

最新推荐文章于 2024-08-09 07:07:30 发布

阅读量1.9k

点赞数

分类专栏：并行计算 AMG c/c++

本文链接：https://blog.csdn.net/liurong_cn/article/details/8956997

版权

VexCL是一个用于OpenCL的向量表达式模板库，旨在简化C++的OpenCL开发，减少样板代码。它提供方便的向量运算、减少、稀疏矩阵向量乘法等功能，并支持多设备计算。VexCL通过环境变量选择计算设备，根据设备带宽自动分区内存，支持自定义函数和内建操作。此外，它还提供了快速傅里叶变换、多向量操作和转换C++算法到OpenCL的功能。

摘要由CSDN通过智能技术生成

转自： http://ddemidov.github.io/vexcl/

留着有空翻译一下

VexCL

VexCL is a vector expression template library for OpenCL. It has been created for ease of OpenCL development with C++. VexCL strives to reduce amount of boilerplate code needed to develop OpenCL applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported. The source code of the library is distributed under very permissive MIT license.

The code is available at https://github.com/ddemidov/vexcl.

Doxygen-generated documentation: http://ddemidov.github.io/vexcl.

Slides from VexCL-related talks:

The paper Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries compares both convenience and performance of several GPGPU libraries, including VexCL.

Context initialization

VexCL can transparently work with multiple compute devices that are present in the system. VexCL context is initialized with a device filter, which is just a functor that takes a reference to cl::Device and returns a bool. Several standard filters are provided, but one can easily add a custom functor. Filters may be combined with logical operators. All compute devices that satisfy the provided filter are added to the created context. In the example below all GPU devices that support double precision arithmetics are selected:

 
    #include <iostream> 
   
    #include <stdexcept> 
   
    #include <vexcl/vexcl.hpp> 
   
    int main() { 
   
     vex::Context ctx( 
     vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::DoublePrecision ); 
   
    if (!ctx) 
     throw std::runtime_error( 
    "No devices available."); 
   
    // Print out list of selected devices: 
   
         std::cout << ctx << std::endl; 
   
     }

One of the most convenient filters is vex::Filter::Env which selects compute devices based on environment variables. It allows to switch compute device without need to recompile the program.

Memory allocation

The vex::vector<T> class constructor accepts a const reference to std::vector<cl::CommandQueue>. A vex::Context instance may be conveniently converted to the type, but it is also possible to initialize the command queues elsewhere, thus completely eliminating the need to create a vex::Context. Each command queue in the list should uniquely identify a single compute device.

The contents of the created vector will be partitioned across all devices that were present in the queue list. Size of each partition will be proportional to the device bandwidth, which is measured the first time the device is used. All vectors of the same size are guaranteed to be partitioned consistently, which allows to minimize inter-device communication.

In the example below, three device vectors of the same size are allocated. Vector A is copied from host vector a, and the other vectors are created uninitialized:

 
    const 
    size_t n = 1024 * 1024; 
   
    vex::Context ctx( vex::Filter::All ); 
   
     std::vector<double> a(n, 1.0); 
   
    vex::vector<double> A(ctx, a); 
   
    vex::vector<double> B(ctx, n); 
   
    vex::vector<do