cuda11弃用了一些函数(cusparseScsrmv,cusparseScsr2csc...),下面详细记录一下用CUDA11的过程,防止以后还要查
已知稀疏矩阵A,稠密向量x,计算y=Ax
A是CSR存储
1. #include "cusparse.h"
2.创建handle
cusparseHandle_t handle = 0;
cusparseCreate(&handle);
创建稀疏矩阵A in CSR format
cusparseSpMatDescr_t matA;
cusparseCreateCsr(&matA, A_num_rows, A_num_cols, A_num_nnz,
dA_csrOffsets, dA_columns, dA_values,
CUSPARSE_INDEX_32I, CUSPARSE_INDEX_32I,
CUSPARSE_INDEX_BASE_ZERO, CUDA_R_32F);
关于上面的cusparseCreateCsr函数:以 CSR 格式初始化稀疏矩阵描述符 spMatDescr。
cusparseStatus_t cusparseCreateCsr(cusparseSpMatDescr_t* spMatDescr, //out,Sparse matrix descriptor int64_t rows, //A的行数 int64_t cols, //A的列数 int64_t nnz, //A的非零元素个数 void* csrRowOffsets,//存行号的数组(rows+1个元素) void* csrColInd, //存列号的数组(nzz个元素) void* csrValues, //存非零元素的数组(nzz) cusparseIndexType_t csrRowOffsetsType,//RowOffsets的数据类型 cusparseIndexType_t csrColIndType,//csrColInd的数据类型 cusparseIndexBase_t idxBase, //存行和列的数组是基于0还是1 cudaDataType valueType) //A中元素的数据类型
创建稠密向量x
cusparseDnVecDescr_t vecX,vecY;
cusparseCreateDnVec(&vecX, A_num_cols, dX, CUDA_R_32F);
cusparseCreateDnVec(&vecY, A_num_rows, dY, CUDA_R_32F);
关于cusparseCreateDnVec:初始化密集向量描述符 dnVecDescr。
cusparseStatus_t cusparseCreateDnVec(cusparseDnVecDescr_t* dnVecDescr, //out int64_t size, //向量大小 void* values, //向量(device cudaDataType valueType) //数据类型
分配额外缓冲区(最后别忘了cudafree)
void* dBuffer = NULL;
size_t bufferSize = 0;
float alpha = 1.0f;
float beta = 0.0f;
cusparseSpMV_bufferSize(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
&alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
CUSPARSE_MV_ALG_DEFAULT, &bufferSize)
cudaMalloc(&dBuffer, bufferSize)
函数cusparseSpMV_bufferSize:为了使用新函数cusparseSpMV,需要提前计算所需缓冲区,并且手动分配这个大小的缓冲区,并作为参数提供给cusparseSpMV。
cusparseSpMV计算的是:
Y = α o p ( A ) ⋅ X + β Y
where op(A) is a sparse matrix with dimensions m × k , X is a dense vector of size k , Y is a dense vector of size m , and α and β are scalars. Also, for matrix A
op ( A ) = A if op(A) == CUSPARSE_OPERATION_NON_TRANSPOSE
A^T if op(A) == CUSPARSE_OPERATION_TRANSPOSE
A^H if op(A) == CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE
cusparseStatus_t cusparseSpMV_bufferSize(cusparseHandle_t handle, cusparseOperation_t opA, //Operation,即上面三种 const void* alpha,//上式的alpha const cusparseSpMatDescr_t matA, //矩阵A的descriptor const cusparseDnVecDescr_t vecX, //向量x的desc const void* beta, //上式beta const cusparseDnVecDescr_t vecY, //y的desc cudaDataType computeType,//数据类型 cusparseSpMVAlg_t alg,//Enumerator specifying the algorithm for the computation,下面详细列举 size_t* bufferSize) //out
关于上面 cusparseSpMVAlg_t:
Format | Notes |
---|---|
CUSPARSE_MV_ALG_DEFAULT | Default algorithm for any sparse matrix format |
CUSPARSE_COOMV_ALG | Default algorithm for COO sparse matrix format |
CUSPARSE_CSRMV_ALG1 | Default algorithm for CSR sparse matrix format |
CUSPARSE_CSRMV_ALG2 | Algorithm 2 for CSR sparse matrix format. May provide better performance for irregular matrices |
3.计算
cusparseSpMV(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
&alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
CUSPARSE_MV_ALG_DEFAULT, dBuffer)
这个函数的的前面参数同上,最后一个参数是分配的缓冲区
cusparseStatus_t cusparseSpMV(cusparseHandle_t handle, cusparseOperation_t opA, const void* alpha, const cusparseSpMatDescr_t matA, const cusparseDnVecDescr_t vecX, const void* beta, const cusparseDnVecDescr_t vecY, //out cudaDataType computeType, cusparseSpMVAlg_t alg, void* externalBuffer)
4. destroy desc
cusparseDestroySpMat(matA) ;
cusparseDestroyDnVec(vecX) ;
cusparseDestroyDnVec(vecY) ;
cusparseDestroy(handle) ;
A是CSC存储
使用cusparseCsr2cscEx2()函数转换成CSR,在此之前要使用cusparseCsr2cscEx2_bufferSize计算缓冲区,并且手动分配
status = cusparseCsr2cscEx2_bufferSize(handle, col, row, nnz,
d_csc_a, d_csc_col, d_csc_row,
d_csr_a,d_csr_row, d_csr_col,
CUDA_R_32F,
CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, &bufferSize);
float *buffer1;
cudaMalloc(&buffer1, bufferSize);
status = cusparseCsr2cscEx2(handle, col, row, nnz,
d_csc_a, d_csc_col, d_csc_row,
d_csr_a,d_csr_row, d_csr_col,
CUDA_R_32F,
CUSPARSE_ACTION_NUMERIC, CUSPARSE_INDEX_BASE_ZERO, CUSPARSE_CSR2CSC_ALG1, buffer1);
函数:
cusparseStatus_t cusparseCsr2cscEx2_bufferSize(cusparseHandle_t handle, int m, int n, int nnz, const void* csrVal, const int* csrRowPtr, const int* csrColInd, void* cscVal, int* cscColPtr, int* cscRowInd, cudaDataType valType, cusparseAction_t copyValues, cusparseIndexBase_t idxBase, cusparseCsr2CscAlg_t alg, size_t* bufferSize)
cusparseStatus_t cusparseCsr2cscEx2(cusparseHandle_t handle, int m, int n, int nnz, const void* csrVal, const int* csrRowPtr, const int* csrColInd, void* cscVal, int* cscColPtr, int* cscRowInd, cudaDataType valType, cusparseAction_t copyValues, cusparseIndexBase_t idxBase, cusparseCsr2CscAlg_t alg, void* buffer)
For alg CUSPARSE_CSR2CSC_ALG1: it requires extra storage proportional to the number of nonzero values nnz. It provides in output always the same matrix.
For alg CUSPARSE_CSR2CSC_ALG2: it requires extra storage proportional to the number of rows m. It does not ensure always the same ordering of CSC column indices and values. Also, it provides better performance then CUSPARSE_CSR2CSC_ALG1 for regular matrices.
handle | handle to the cuSPARSE library context |
m | number of rows of the CSR input matrix; number of columns of the CSC ouput matrix |
n | number of columns of the CSR input matrix; number of rows of the CSC ouput matrix |
nnz | number of nonzero elements of the CSR and CSC matrices |
csrVal | value array of size nnz of the CSR matrix; of same type as valType |
csrRowPtr | integer array of size m + 1 that containes the CSR row offsets |
csrColInd | integer array of size nnz that containes the CSR column indices |
valType | value type for both CSR and CSC matrices |
copyValues | CUSPARSE_ACTION_SYMBOLIC or CUSPARSE_ACTION_NUMERIC //前者返回变换后的val矩阵全为0 |
idxBase | Index base CUSPARSE_INDEX_BASE_ZERO or CUSPARSE_INDEX_BASE_ONE. |
alg | algorithm implementation. see cusparseCsr2CscAlg_t for possible values. |
bufferSize | number of bytes of workspace needed by cusparseCsr2cscEx2() |
buffer | pointer to workspace buffer |