CUDA11 cusparse使用

weixin_49830558

已于 2023-05-06 16:46:05 修改

阅读量1.5k

点赞数 5

文章标签：算法

于 2023-05-04 22:23:12 首次发布

本文链接：https://blog.csdn.net/weixin_49830558/article/details/130494377

版权

文章详细介绍了在CUDA11中，如何使用cusparse库进行稀疏矩阵（CSR格式）与稠密向量的乘法运算，包括创建处理句柄、稀疏矩阵描述符、稠密向量描述符，以及计算过程。同时，文章提到了cusparseScsrmv等函数的弃用，并展示了如何使用新的cusparseSpMV函数进行计算。此外，还讲述了如何将CSR矩阵转换为CSC矩阵的步骤，包括缓冲区的计算与分配。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

cuda11弃用了一些函数（cusparseScsrmv，cusparseScsr2csc...），下面详细记录一下用CUDA11的过程，防止以后还要查

已知稀疏矩阵A，稠密向量x，计算y=Ax

A是CSR存储

1. #include "cusparse.h"

2.创建handle

cusparseHandle_t     handle = 0;
cusparseCreate(&handle)；

创建稀疏矩阵A in CSR format

cusparseSpMatDescr_t matA;
cusparseCreateCsr(&matA, A_num_rows, A_num_cols, A_num_nnz,
                                      dA_csrOffsets, dA_columns, dA_values,
                                      CUSPARSE_INDEX_32I, CUSPARSE_INDEX_32I,
                                      CUSPARSE_INDEX_BASE_ZERO, CUDA_R_32F);

关于上面的cusparseCreateCsr函数：以 CSR 格式初始化稀疏矩阵描述符 spMatDescr。

cusparseStatus_t
cusparseCreateCsr(cusparseSpMatDescr_t* spMatDescr,  //out,Sparse matrix descriptor
                  int64_t               rows,        //A的行数
                  int64_t               cols,        //A的列数
                  int64_t               nnz,         //A的非零元素个数
                  void*                 csrRowOffsets,//存行号的数组（rows+1个元素）
                  void*                 csrColInd,   //存列号的数组（nzz个元素）
                  void*                 csrValues,   //存非零元素的数组（nzz）
                  cusparseIndexType_t   csrRowOffsetsType,//RowOffsets的数据类型
                  cusparseIndexType_t   csrColIndType,//csrColInd的数据类型
                  cusparseIndexBase_t   idxBase,      //存行和列的数组是基于0还是1
                  cudaDataType          valueType)    //A中元素的数据类型

创建稠密向量x

cusparseDnVecDescr_t vecX,vecY;
cusparseCreateDnVec(&vecX, A_num_cols, dX, CUDA_R_32F);
cusparseCreateDnVec(&vecY, A_num_rows, dY, CUDA_R_32F);

关于cusparseCreateDnVec：初始化密集向量描述符 dnVecDescr。

cusparseStatus_t
cusparseCreateDnVec(cusparseDnVecDescr_t* dnVecDescr,   //out
                    int64_t               size,         //向量大小
                    void*                 values,       //向量(device
                    cudaDataType          valueType)    //数据类型

分配额外缓冲区（最后别忘了cudafree）

void*  dBuffer    = NULL;
size_t bufferSize = 0;
float alpha = 1.0f;
float beta  = 0.0f;
cusparseSpMV_bufferSize(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
                                 &alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
                                 CUSPARSE_MV_ALG_DEFAULT, &bufferSize)
cudaMalloc(&dBuffer, bufferSize)

函数cusparseSpMV_bufferSize：为了使用新函数cusparseSpMV，需要提前计算所需缓冲区，并且手动分配这个大小的缓冲区，并作为参数提供给cusparseSpMV。

cusparseSpMV计算的是：

Y = α o p ( A ) ⋅ X + β Y

where op(A) is a sparse matrix with dimensions m × k , X is a dense vector of size k , Y is a dense vector of size m , and α and β are scalars. Also, for matrix A

op ( A ) = A if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE

A^T if op(A) == CUSPARSE_OPERATION_TRANSPOSE

A^H if op(A) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE

cusparseStatus_t
cusparseSpMV_bufferSize(cusparseHandle_t           handle,
                        cusparseOperation_t        opA,  //Operation,即上面三种
                        const void*                alpha,//上式的alpha
                        const cusparseSpMatDescr_t matA, //矩阵A的descriptor
                        const cusparseDnVecDescr_t vecX, //向量x的desc
                        const void*                beta, //上式beta
                        const cusparseDnVecDescr_t vecY, //y的desc
                        cudaDataType               computeType,//数据类型
                        cusparseSpMVAlg_t          alg,//Enumerator specifying the algorithm for the computation,下面详细列举
                        size_t*                    bufferSize)  //out

关于上面 cusparseSpMVAlg_t：

Format	Notes
CUSPARSE_MV_ALG_DEFAULT	Default algorithm for any sparse matrix format
CUSPARSE_COOMV_ALG	Default algorithm for COO sparse matrix format
CUSPARSE_CSRMV_ALG1	Default algorithm for CSR sparse matrix format
CUSPARSE_CSRMV_ALG2	Algorithm 2 for CSR sparse matrix format. May provide better performance for irregular matrices

3.计算

cusparseSpMV(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
                                 &alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
                                 CUSPARSE_MV_ALG_DEFAULT, dBuffer)

这个函数的的前面参数同上，最后一个参数是分配的缓冲区

cusparseStatus_t
cusparseSpMV(cusparseHandle_t           handle,
             cusparseOperation_t        opA,
             const void*                alpha,
             const cusparseSpMatDescr_t matA,
             const cusparseDnVecDescr_t vecX,
             const void*                beta,
             const cusparseDnVecDescr_t vecY,        //out
             cudaDataType               computeType,
             cusparseSpMVAlg_t          alg,
             void*                      externalBuffer)

4. destroy desc

cusparseDestroySpMat(matA) ;
cusparseDestroyDnVec(vecX) ;
cusparseDestroyDnVec(vecY) ;
cusparseDestroy(handle) ;

A是CSC存储

使用cusparseCsr2cscEx2()函数转换成CSR，在此之前要使用cusparseCsr2cscEx2_bufferSize计算缓冲区，并且手动分配

status = cusparseCsr2cscEx2_bufferSize(handle, col, row, nnz,
                                          d_csc_a, d_csc_col, d_csc_row,
                                          d_csr_a,d_csr_row, d_csr_col, 
                                          CUDA_R_32F,
                                          CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, &bufferSize);
float *buffer1;
cudaMalloc(&buffer1, bufferSize);
status = cusparseCsr2cscEx2(handle, col, row, nnz,
                                d_csc_a, d_csc_col, d_csc_row,
                                d_csr_a,d_csr_row, d_csr_col, 
                                CUDA_R_32F,
                                CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, buffer1);

函数：

cusparseStatus_t
cusparseCsr2cscEx2_bufferSize(cusparseHandle_t     handle,
                              int                  m,
                              int                  n,
                              int                  nnz,
                              const void*          csrVal,
                              const int*           csrRowPtr,
                              const int*           csrColInd,
                              void*                cscVal,
                              int*                 cscColPtr,
                              int*                 cscRowInd,
                              cudaDataType         valType,
                              cusparseAction_t     copyValues,
                              cusparseIndexBase_t  idxBase,
                              cusparseCsr2CscAlg_t alg,
                              size_t*              bufferSize)

cusparseStatus_t
cusparseCsr2cscEx2(cusparseHandle_t     handle,
                   int                  m,
                   int                  n,
                   int                  nnz,
                   const void*          csrVal,
                   const int*           csrRowPtr,
                   const int*           csrColInd,
                   void*                cscVal,
                   int*                 cscColPtr,
                   int*                 cscRowInd,
                   cudaDataType         valType,
                   cusparseAction_t     copyValues,
                   cusparseIndexBase_t  idxBase,
                   cusparseCsr2CscAlg_t alg,
                   void*                buffer)

For alg CUSPARSE_CSR2CSC_ALG1: it requires extra storage proportional to the number of nonzero values nnz. It provides in output always the same matrix.

For alg CUSPARSE_CSR2CSC_ALG2: it requires extra storage proportional to the number of rows m. It does not ensure always the same ordering of CSC column indices and values. Also, it provides better performance then CUSPARSE_CSR2CSC_ALG1 for regular matrices.

handle	handle to the cuSPARSE library context
m	number of rows of the CSR input matrix; number of columns of the CSC ouput matrix
n	number of columns of the CSR input matrix; number of rows of the CSC ouput matrix
nnz	number of nonzero elements of the CSR and CSC matrices
csrVal	value array of size nnz of the CSR matrix; of same type as valType
csrRowPtr	integer array of size m + 1 that containes the CSR row offsets
csrColInd	integer array of size nnz that containes the CSR column indices
valType	value type for both CSR and CSC matrices
copyValues	CUSPARSE_ACTION_SYMBOLIC or CUSPARSE_ACTION_NUMERIC //前者返回变换后的val矩阵全为0
idxBase	Index base CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE.
alg	algorithm implementation. see cusparseCsr2CscAlg_t for possible values.
bufferSize	number of bytes of workspace needed by cusparseCsr2cscEx2()
buffer	pointer to workspace buffer