cusparseSnnz(cusparseHandle_t handle, cusparseDirection_t dirA, int m, int n, const cusparseMatDescr_t descrA, const float *A, int lda, int *nnzPerRowColumn, int *nnzTotalDevHostPtr)
This function computes the number of nonzero elements per row or column and the total number of nonzero elements in a dense matrix.
这个函数计算每个行或列的非零元素数和稠密矩阵中非零元素的总数。
This function requires no extra storage. It is executed asynchronously with respect to the host and may return control to the application on the host before the result is ready.
此函数不需要额外存储。它相对于主机异步执行,在结果就绪之前可以将控制权返回给主机上的应用程序。
handle | handle to the cuSPARSE library context. |
dirA | direction that specifies whether to count nonzero elements by CUSPARSE_DIRECTION_ROW or by CUSPARSE_DIRECTION_COLUMN. 通过行模式还是列模式计算矩阵非零元素个数 |
m | number of rows of matrix A. 矩阵行数目 |
n | number of columns of matrix A. 矩阵列数目 |
descrA | the descriptor of matrix A. The supported matrix type is CUSPARSE_MATRIX_TYPE_GENERAL. Also, the supported index bases are CUSPARSE_INDEX_BASE_ZERO andCUSPARSE_INDEX_BASE_ONE. 函数支持的输入矩阵的模式。 |
A | array of dimensions (lda, n). 输入的矩阵指针 |
lda | leading dimension of dense array A. 密集阵的主导维数,也就是矩阵行数。 |
nnzPerRowColumn | array of size m or n containing the number of nonzero elements per row or column, respectively. 每一行或每一列非零元素组成的数组 |
nnzTotalDevHostPtr | total number of nonzero elements in device or host memory. 所有非零元素的个数 |
CUSPARSE_STATUS_SUCCESS | the operation completed successfully. |
CUSPARSE_STATUS_NOT_INITIALIZED | the library was not initialized. |
CUSPARSE_STATUS_ALLOC_FAILED | the resources could not be allocated. |
CUSPARSE_STATUS_INVALID_VALUE | invalid parameters were passed (m, n<0). |
CUSPARSE_STATUS_ARCH_MISMATCH | the device does not support double precision. |
CUSPARSE_STATUS_EXECUTION_FAILED | the function failed to launch on the GPU. |
CUSPARSE_STATUS_INTERNAL_ERROR | an internal operation failed. |
CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED | the matrix type is not supported. |
测试代码:
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include "cusparse.h"
#include <cublas_v2.h>
#include <helper_cuda.h>
#include <iostream>
using namespace std;
int main() {
cusparseStatus_t status;
cusparseHandle_t handle = 0;
cusparseMatDescr_t descr = 0;
status = cusparseCreate(&handle);
if (status != CUSPARSE_STATUS_SUCCESS) {
cout << "CUSPARSE Library initialization failed" << endl;
}
status = cusparseCreateMatDescr(&descr);
if (status != CUSPARSE_STATUS_SUCCESS) {
cout << "Matrix descriptor initialization failed" << endl;
}
status = cusparseSetMatType(descr, CUSPARSE_MATRIX_TYPE_GENERAL);
if (status != CUSPARSE_STATUS_SUCCESS) {
cout << "cusparseSetMatType failed" << endl;
}
status = cusparseSetMatIndexBase(descr, CUSPARSE_INDEX_BASE_ZERO);
if (status != CUSPARSE_STATUS_SUCCESS) {
cout << "cusparseSetMatIndexBase failed" << endl;
}
int *nnzPerRow=0;
int nnzTotal;
float* d_Temp;
cudaMallocManaged(&d_Temp, sizeof(float) * 6);//分配矩阵存储空间,并初始化。
d_Temp[0]=1.0;
d_Temp[1]=0.0;
d_Temp[2]=2.0;
d_Temp[3]=3.0;
d_Temp[4]=0.0;
d_Temp[5]=2.0;
//
cudaMallocManaged(&nnzPerRow, sizeof(int) * 2);
//
status = cusparseSnnz(handle, CUSPARSE_DIRECTION_ROW, 2, 3, descr, d_Temp, 2, nnzPerRow, &nnzTotal);
if (status != CUSPARSE_STATUS_SUCCESS) {
cout << "nnz calculation failed" << endl;
cout << "status = " << status << endl;
}
cout << "nnzTotal = " << nnzTotal << endl;
cout << "nnzPerRow[0] = " << nnzPerRow[0] << endl;
cout << "nnzPerRow[1] = " << nnzPerRow[1] << endl;
cudaFree(d_Temp);
cudaFree(nnzPerRow);
}
原始数组:
1.0 0.0 2.0 3.0 0.0 2.0
原始矩阵:GPU将原始数据按照列优先的方式排列
1.0 2.0 0.0
0.0 3.0 2.0
计算结果:
nnzTotal = 4nnzPerRow[0] = 2
nnzPerRow[1] = 2