基于oneAPI的并行矩阵乘法的实现方法

最新推荐文章于 2024-09-17 20:47:58 发布

BUPT_OS

最新推荐文章于 2024-09-17 20:47:58 发布

阅读量912

点赞数 22

文章标签： oneapi 矩阵线性代数

本文链接：https://blog.csdn.net/BUPT_OS/article/details/134732294

版权

本次作业实现了一个基于oneAPI的C++/SYCL程序来执行矩阵乘法操作。考虑了大尺寸矩阵的乘法操作以及不同线程之间的数据依赖关系。本次作业是使用英特尔提供的oneAPI Developer Cloud 服务完成的，可免安装额外环境，直接利用Developer Cloud平台中的CPU与GPU硬件完成相应的作业。

一、注册Developer Cloud

首先登录 https://www.intel.com/content/www/cn/zh/secure/forms/devcloud/enrollment.html 开始注册一个新的英特尔用户账号。

其次，访问 https://devcloud.intel.com/oneapi/home/ 页面，点击get free assess

然后按照提示完成相应步骤，并完成激活账号

在激活账号完成后会自动跳转到如下界面

用户可以通过点击网页左侧“ Get Started ”进入该选项的页面，在“ Get Started ”选项页面中，用户可以通过点击页面最左下角 Connect with Jupyter* Lab 中的“ Launch JupyterLab* ”按钮直接启动 Jupypter 服务。进来之后应该是如下界面：

二、编写代码，调试程序

此次作业的任务是将五个矩阵依次并行乘法计算后得出计算结果，五个矩阵文件分别是matrix0.txt，matrix1.txt，matrix2.txt，matrix3.txt，matrix4.txt。这五个矩阵的行列都小于128x128，故使用128x128的矩阵即可解决五个矩阵行列不整齐的问题。

首先在根目录下依次创建matrix0.txt，matrix1.txt，matrix2.txt，matrix3.txt，matrix4.txt五个文件，把作业中给的矩阵数据分别填入对应的txt文件中。

下面是并行运算代码：

#include <CL/sycl.hpp>
#include <iostream>

constexpr size_t N = 128;

int main() {
    std::vector<double> matrixA(N * N, 0.0f);
    std::vector<double> matrixB(N * N, 0.0f);
    std::vector<double> matrixC(N * N, 0.0f);
    for (int i = 0; i < 4; i++) { //进行四次矩阵乘法
        if (i == 0) {
            std::ifstream in1("matrix0.txt"); //使用标准库函数读入矩阵
            if(!in1) {
                std::cerr<<"Failed to open file."<<"\n";
                exit(1);
            }
            std::string line;
            int ff = 0;
            while(std::getline(in1, line)) {
                std::stringstream ss(line);
                int temp = ff;
                while(ss) {
                    ss >> matrixA[temp];
                    temp++;
                }
                ff += N;
            }
        }
        std::string file = "matrix" + std::to_string(i+1) + ".txt"; //读入矩阵
        std::ifstream in2(file);
        if(!in2) {
            std::cerr<<"Failed to open file."<<"\n";
            exit(1);
        }
        std::string line;
        int ff = 0;
        while(std::getline(in2, line)) {
            std::stringstream ss(line);
            int temp = ff;
            while(ss) {
                ss >> matrixB[temp];
                temp++;
            }
            ff += N;
        }
        try {
              sycl::queue myQueue;
              sycl::range<2> size(N, N);

              sycl::buffer<double, 2> bufferA(matrixA.data(), size);
              sycl::buffer<double, 2> bufferB(matrixB.data(), size);
              sycl::buffer<double, 2> bufferC(matrixC.data(), size);

              myQueue.submit([&](sycl::handler& cgh) {
                  auto accessorA = bufferA.get_access<sycl::access::mode::read>(cgh);
                  auto accessorB = bufferB.get_access<sycl::access::mode::read>(cgh);
                  auto accessorC = bufferC.get_access<sycl::access::mode::write>(cgh);

                  cgh.parallel_for<class MatrixMultiply>(size, [=](sycl::id<2> idx) {
                      double sum = 0.0f;
                      for (int k = 0; k < N; ++k) {
                          sum += accessorA[idx[0]][k] * accessorB[k][idx[1]];
                      }
                      accessorC[idx] = sum;
                  });
              });

            myQueue.wait();
        } catch (sycl::exception const& e) {
            std::cerr << "An exception occurred: " << e.what() << std::endl;
            return 1;
        }
        matrixB = matrixC;
  
    }
  
    // 打印结果
    for (size_t i = 0; i < 66; ++i) {
        for (size_t j = 0; j < 54; ++j) {
          std::cout << matrixC[i * N + j] << " ";
        }
        std::cout << std::endl;
    }

    return 0;
}

下面打开Terminal终端，输入vim dpc.cpp，把上述代码填入cpp文件中

然后输入icpx -fsycl dpc.cpp -o dpc编译dpc.cpp文件，得到dpc，然后输入./dpc即可查看最终矩阵计算的结果