windows C++ 并行编程-使用加速器对象(上)

sului

于 2024-09-05 00:15:00 发布

阅读量951

点赞数 15

分类专栏： windows C++并行编程技术文章标签： c++ windows

本文链接：https://blog.csdn.net/m0_72813396/article/details/141539229

版权

windows C++并行编程技术专栏收录该内容

22 篇文章 1 订阅

订阅专栏

可以使用 accelerator 和 accelerator_view 类指定要运行 C++ AMP 代码的设备或仿真器。系统可能有多个设备或仿真器，它们在内存量、共享内存支持、调试支持或双精度支持方面有所不同。 C++ Accelerated Massive Parallelism (C++ AMP) 提供可用于检查可用加速器、将一个加速器设置为默认加速器、为多个 parallel_for_each 调用指定多个 accelerator_view 并执行特殊调试任务的 API。

使用默认加速器

除非编写代码以选取特定加速器，否则 C++ AMP 运行时会选取默认加速器。运行时按如下所示选择默认加速器：

如果应用在调试模式下运行，则是支持调试的加速器。
否则是由 CPPAMP_DEFAULT_ACCELERATOR 环境变量（如果已设置）指定的加速器。
否则是非仿真设备。
否则是具有最大可用内存量的设备。
否则是未附加到显示的设备。

此外，运行时会为默认加速器将 access_type 指定为 access_type_auto。这意味着，如果共享内存受支持并且其性能特征（带宽和延迟）已知与专用（非共享）内存相同，则默认加速器会使用共享内存。

可以通过构造默认加速器并检查其属性来确定默认加速器的属性。下面的代码示例会打印默认加速器的路径、加速器内存量、共享内存支持、双精度支持以及有限双精度支持。

void default_properties() {
    accelerator default_acc;
    std::wcout << default_acc.device_path << "\n";
    std::wcout << default_acc.dedicated_memory << "\n";
    std::wcout << (accs[i].supports_cpu_shared_memory ?
        "CPU shared memory: true" : "CPU shared memory: false") << "\n";
    std::wcout << (accs[i].supports_double_precision ?
        "double precision: true" : "double precision: false") << "\n";
    std::wcout << (accs[i].supports_limited_double_precision ?
        "limited double precision: true" : "limited double precision: false") << "\n";
}

CPPAMP_DEFAULT_ACCELERATOR 环境变量

可以设置 CPPAMP_DEFAULT_ACCELERATOR 环境变量以指定默认加速器的 accelerator::device_path。路径依赖于硬件。以下代码使用 accelerator::get_all 函数检索可用加速器的列表，然后显示每个加速器的路径和特征。

void list_all_accelerators()
{
    std::vector<accelerator> accs = accelerator::get_all();

    for (int i = 0; i <accs.size(); i++) {
        std::wcout << accs[i].device_path << "\n";
        std::wcout << accs[i].dedicated_memory << "\n";
        std::wcout << (accs[i].supports_cpu_shared_memory ?
            "CPU shared memory: true" : "CPU shared memory: false") << "\n";
        std::wcout << (accs[i].supports_double_precision ?
            "double precision: true" : "double precision: false") << "\n";
        std::wcout << (accs[i].supports_limited_double_precision ?
            "limited double precision: true" : "limited double precision: false") << "\n";
    }
}

选择加速器

若要选择加速器，请使用 accelerator::get_all 方法检索可用加速器的列表，然后基于其属性选择一个加速器。此示例演示如何选取内存最多的加速器：

void pick_with_most_memory()
{
    std::vector<accelerator> accs = accelerator::get_all();
    accelerator acc_chosen = accs[0];

    for (int i = 0; i <accs.size(); i++) {
        if (accs[i].dedicated_memory> acc_chosen.dedicated_memory) {
            acc_chosen = accs[i];
        }
    }

    std::wcout << "The accelerator with the most memory is "
        << acc_chosen.device_path << "\n"
        << acc_chosen.dedicated_memory << ".\n";
}

accelerator::get_all 返回的加速器之一是 CPU 加速器。无法在 CPU 加速器上执行代码。若要筛选出 CPU 加速器，请将 accelerator::get_all 返回的加速器的 device_path 属性值与 accelerator::cpu_accelerator 的值进行比较。

Shared Memory

共享内存是 CPU 和加速器都可以访问的内存。使用共享内存可以消除或显著减少在 CPU 和加速器之间复制数据的开销。尽管内存是共享的，但 CPU 和加速器不能同时访问它，因为这样做会导致未定义的行为。如果加速器支持共享内存，则加速器属性 supports_cpu_shared_memory 返回 true，并且 default_cpu_access_type 属性会获取在 accelerator 上分配的内存的默认 access_type（例如，与在 accelerator 上访问的 accelerator 或 array_view 对象关联的数组）。

C++ AMP 运行时会自动为每个 accelerator 选择最佳默认 access_type，但是在从 CPU 读取、从 CPU 写入或同时执行两种操作时，共享内存的性能特征（带宽和延迟）比专用（非共享）加速器内存的性能特征更差。如果对于从 CPU 读取和写入，共享内存的性能与专用内存一样好，则运行时默认为 access_type_read_write；否则，运行时会选择更保守的默认值 access_type，并允许应用在计算内核的内存访问模式受益于其他 access_type 时替代它。

下面的代码示例演示如何确定默认加速器是否支持共享内存，然后替代其默认访问类型并从中创建 accelerator_view。

#include <amp.h>
#include <iostream>

using namespace Concurrency;

int main()
{
    accelerator acc = accelerator(accelerator::default_accelerator);

    // Early out if the default accelerator doesn't support shared memory.
    if (!acc.supports_cpu_shared_memory)
    {
        std::cout << "The default accelerator does not support shared memory" << std::endl;
        return 1;
    }

    // Override the default CPU access type.
    acc.set_default_cpu_access_type(access_type_read_write);

    // Create an accelerator_view from the default accelerator. The
    // accelerator_view reflects the default_cpu_access_type of the
    // accelerator it's associated with.
    accelerator_view acc_v = acc.default_view;
}

accelerator_view 始终反映与其关联的 accelerator 的 default_cpu_access_type，它不提供任何接口来替代或更改其 access_type。

更改默认加速器

可以通过调用 accelerator::set_default 方法来更改默认加速器。每次应用执行时只能更改默认加速器一次，并且必须先更改它，然后才能在 GPU 上执行任何代码。用于更改加速器的任何后续函数调用会返回 false。如果要在 parallel_for_each 调用中使用其他加速器，请阅读本文中的“使用多个加速器”部分。下面的代码示例将默认加速器设置为未仿真、未连接到显示器且支持双精度的加速器。

bool pick_accelerator()
{
    std::vector<accelerator> accs = accelerator::get_all();
    accelerator chosen_one;

    auto result = std::find_if(accs.begin(), accs.end(),
        [] (const accelerator& acc) {
            return !acc.is_emulated &&
                acc.supports_double_precision &&
                !acc.has_display;
        });

    if (result != accs.end()) {
        chosen_one = *(result);
    }

    std::wcout <<chosen_one.description <<std::endl;
    bool success = accelerator::set_default(chosen_one.device_path);
    return success;
}

使用多个加速器

可通过两种方法在应用中使用多个加速器：

可以将 accelerator_view 对象传递给对 parallel_for_each 方法的调用；
可以使用特定 accelerator_view 对象构造数组对象。 C+AMP 运行时会从 lambda 表达式中捕获的数组对象中选取 accelerator_view 对象；

特殊加速器

三个特殊加速器的设备路径可用作 accelerator 类的属性：

accelerator::direct3d_ref 数据成员：此单线程加速器在 CPU 上使用软件来仿真通用图形卡。它在默认情况下用于调试，但它不用于生产环境，因为它比硬件加速器更慢。此外，它仅在 DirectX SDK 和 Windows SDK 中可用，不太可能安装在客户的计算机上；
accelerator::direct3d_warp 数据成员：此加速器提供一个回退解决方案，用于在使用流式处理 SIMD 扩展 (SSE) 的多核 CPU 执行 C++ AMP 代码；
accelerator::cpu_accelerator 数据成员：可以使用此加速器设置暂存数组。它无法执行 C++ AMP 代码；

互操作性

C++ AMP 运行时支持 accelerator_view 类与 Direct3D ID3D11Device 接口之间的互操作性。 create_accelerator_view 方法采用 IUnknown 接口，并返回 accelerator_view 对象。 get_device 方法采用 accelerator_view 对象，并返回 IUnknown 接口。

sului

关注

15
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
windows C++ 并行编程-使用加速器对象(上)

可以使用 accelerator 和 accelerator_view 类指定要运行 C++ AMP 代码的设备或仿真器。系统可能有多个设备或仿真器，它们在内存量、共享内存支持、调试支持或双精度支持方面有所不同。C++ Accelerated Massive Parallelism (C++ AMP) 提供可用于检查可用加速器、将一个加速器设置为默认加速器、为多个 parallel_for_each 调用指定多个 accelerator_view 并执行特殊调试任务的 API。
复制链接

扫一扫