CUDA11 cusparse使用

文章详细介绍了在CUDA11中,如何使用cusparse库进行稀疏矩阵(CSR格式)与稠密向量的乘法运算,包括创建处理句柄、稀疏矩阵描述符、稠密向量描述符,以及计算过程。同时,文章提到了cusparseScsrmv等函数的弃用,并展示了如何使用新的cusparseSpMV函数进行计算。此外,还讲述了如何将CSR矩阵转换为CSC矩阵的步骤,包括缓冲区的计算与分配。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

cuda11弃用了一些函数(cusparseScsrmv,cusparseScsr2csc...),下面详细记录一下用CUDA11的过程,防止以后还要查

已知稀疏矩阵A,稠密向量x,计算y=Ax

A是CSR存储

1. #include "cusparse.h"

2.创建handle

cusparseHandle_t     handle = 0;
cusparseCreate(&handle);

创建稀疏矩阵A in CSR format

cusparseSpMatDescr_t matA;
cusparseCreateCsr(&matA, A_num_rows, A_num_cols, A_num_nnz,
                                      dA_csrOffsets, dA_columns, dA_values,
                                      CUSPARSE_INDEX_32I, CUSPARSE_INDEX_32I,
                                      CUSPARSE_INDEX_BASE_ZERO, CUDA_R_32F);

关于上面的cusparseCreateCsr函数:以 CSR 格式初始化稀疏矩阵描述符 spMatDescr。

cusparseStatus_t
cusparseCreateCsr(cusparseSpMatDescr_t* spMatDescr,  //out,Sparse matrix descriptor
                  int64_t               rows,        //A的行数
                  int64_t               cols,        //A的列数
                  int64_t               nnz,         //A的非零元素个数
                  void*                 csrRowOffsets,//存行号的数组(rows+1个元素)
                  void*                 csrColInd,   //存列号的数组(nzz个元素)
                  void*                 csrValues,   //存非零元素的数组(nzz)
                  cusparseIndexType_t   csrRowOffsetsType,//RowOffsets的数据类型
                  cusparseIndexType_t   csrColIndType,//csrColInd的数据类型
                  cusparseIndexBase_t   idxBase,      //存行和列的数组是基于0还是1
                  cudaDataType          valueType)    //A中元素的数据类型

创建稠密向量x

cusparseDnVecDescr_t vecX,vecY;
cusparseCreateDnVec(&vecX, A_num_cols, dX, CUDA_R_32F);
cusparseCreateDnVec(&vecY, A_num_rows, dY, CUDA_R_32F);

关于cusparseCreateDnVec:初始化密集向量描述符 dnVecDescr。

cusparseStatus_t
cusparseCreateDnVec(cusparseDnVecDescr_t* dnVecDescr,   //out
                    int64_t               size,         //向量大小
                    void*                 values,       //向量(device
                    cudaDataType          valueType)    //数据类型

分配额外缓冲区(最后别忘了cudafree)

void*  dBuffer    = NULL;
size_t bufferSize = 0;
float alpha = 1.0f;
float beta  = 0.0f;
cusparseSpMV_bufferSize(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
                                 &alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
                                 CUSPARSE_MV_ALG_DEFAULT, &bufferSize)
cudaMalloc(&dBuffer, bufferSize)

函数cusparseSpMV_bufferSize:为了使用新函数cusparseSpMV,需要提前计算所需缓冲区,并且手动分配这个大小的缓冲区,并作为参数提供给cusparseSpMV。

cusparseSpMV计算的是:

Y = α o p ( A ) ⋅ X + β Y

where op(A) is a sparse matrix with dimensions m × k , X is a dense vector of size k , Y is a dense vector of size m , and α and β are scalars. Also, for matrix A

op ( A ) = A if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE

                A^T if op(A) == CUSPARSE_OPERATION_TRANSPOSE

                A^H if op(A) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE

cusparseStatus_t
cusparseSpMV_bufferSize(cusparseHandle_t           handle,
                        cusparseOperation_t        opA,  //Operation,即上面三种
                        const void*                alpha,//上式的alpha
                        const cusparseSpMatDescr_t matA, //矩阵A的descriptor
                        const cusparseDnVecDescr_t vecX, //向量x的desc
                        const void*                beta, //上式beta
                        const cusparseDnVecDescr_t vecY, //y的desc
                        cudaDataType               computeType,//数据类型
                        cusparseSpMVAlg_t          alg,//Enumerator specifying the algorithm for the computation,下面详细列举
                        size_t*                    bufferSize)  //out 

关于上面 cusparseSpMVAlg_t:

FormatNotes
CUSPARSE_MV_ALG_DEFAULTDefault algorithm for any sparse matrix format
CUSPARSE_COOMV_ALGDefault algorithm for COO sparse matrix format
CUSPARSE_CSRMV_ALG1Default algorithm for CSR sparse matrix format
CUSPARSE_CSRMV_ALG2Algorithm 2 for CSR sparse matrix format. May provide better performance for irregular matrices

 3.计算

cusparseSpMV(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
                                 &alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
                                 CUSPARSE_MV_ALG_DEFAULT, dBuffer)

这个函数的的前面参数同上,最后一个参数是分配的缓冲区

cusparseStatus_t
cusparseSpMV(cusparseHandle_t           handle,
             cusparseOperation_t        opA,
             const void*                alpha,
             const cusparseSpMatDescr_t matA,
             const cusparseDnVecDescr_t vecX,
             const void*                beta,
             const cusparseDnVecDescr_t vecY,        //out
             cudaDataType               computeType,
             cusparseSpMVAlg_t          alg,
             void*                      externalBuffer)

4. destroy desc

cusparseDestroySpMat(matA) ;
cusparseDestroyDnVec(vecX) ;
cusparseDestroyDnVec(vecY) ;
cusparseDestroy(handle) ;

A是CSC存储

使用cusparseCsr2cscEx2()函数转换成CSR,在此之前要使用cusparseCsr2cscEx2_bufferSize计算缓冲区,并且手动分配

status = cusparseCsr2cscEx2_bufferSize(handle, col, row, nnz,
                                          d_csc_a, d_csc_col, d_csc_row,
                                          d_csr_a,d_csr_row, d_csr_col, 
                                          CUDA_R_32F,
                                          CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, &bufferSize);
float *buffer1;
cudaMalloc(&buffer1, bufferSize);
status = cusparseCsr2cscEx2(handle, col, row, nnz,
                                d_csc_a, d_csc_col, d_csc_row,
                                d_csr_a,d_csr_row, d_csr_col, 
                                CUDA_R_32F,
                                CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, buffer1);
                                        

 函数:

cusparseStatus_t
cusparseCsr2cscEx2_bufferSize(cusparseHandle_t     handle,
                              int                  m,
                              int                  n,
                              int                  nnz,
                              const void*          csrVal,
                              const int*           csrRowPtr,
                              const int*           csrColInd,
                              void*                cscVal,
                              int*                 cscColPtr,
                              int*                 cscRowInd,
                              cudaDataType         valType,
                              cusparseAction_t     copyValues,
                              cusparseIndexBase_t  idxBase,
                              cusparseCsr2CscAlg_t alg,
                              size_t*              bufferSize)
cusparseStatus_t
cusparseCsr2cscEx2(cusparseHandle_t     handle,
                   int                  m,
                   int                  n,
                   int                  nnz,
                   const void*          csrVal,
                   const int*           csrRowPtr,
                   const int*           csrColInd,
                   void*                cscVal,
                   int*                 cscColPtr,
                   int*                 cscRowInd,
                   cudaDataType         valType,
                   cusparseAction_t     copyValues,
                   cusparseIndexBase_t  idxBase,
                   cusparseCsr2CscAlg_t alg,
                   void*                buffer)

For alg CUSPARSE_CSR2CSC_ALG1: it requires extra storage proportional to the number of nonzero values nnz. It provides in output always the same matrix.

For alg CUSPARSE_CSR2CSC_ALG2: it requires extra storage proportional to the number of rows m. It does not ensure always the same ordering of CSC column indices and values. Also, it provides better performance then CUSPARSE_CSR2CSC_ALG1 for regular matrices.

handlehandle to the cuSPARSE library context
mnumber of rows of the CSR input matrix; number of columns of the CSC ouput matrix
nnumber of columns of the CSR input matrix; number of rows of the CSC ouput matrix
nnznumber of nonzero elements of the CSR and CSC matrices
csrValvalue array of size nnz of the CSR matrix; of same type as valType
csrRowPtrinteger array of size m + 1 that containes the CSR row offsets
csrColIndinteger array of size nnz that containes the CSR column indices
valTypevalue type for both CSR and CSC matrices
copyValuesCUSPARSE_ACTION_SYMBOLIC or CUSPARSE_ACTION_NUMERIC    //前者返回变换后的val矩阵全为0
idxBaseIndex base CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE.
algalgorithm implementation. see cusparseCsr2CscAlg_t for possible values.
bufferSizenumber of bytes of workspace needed by cusparseCsr2cscEx2()
bufferpointer to workspace buffer

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值