cublasSetVector/cublasSetMatrix参数说明-CSDN博客

本文链接：https://blog.csdn.net/xiaobaiqing1983/article/details/118145678

cublasStatus_t

cublasSetVector(int n, int elemSize,
                const void *x, int incx, void *y, int incy)

n, elemSize参数好理解

incx/incy 原文为The storage spacing between consecutive elements is given by incx for the source vector x and by incy for the destination vector y

incx 表达的意思要将x中n个元素拷贝到y中时，相邻元素在x中是怎么存储的，在C/C++语言中x为连续数组，则相邻两个元素相差1, incx =1，但有时，可能x存储的是矩阵，只要取某一列的向量，由于C/C++语言是按行存储，GPU是按列存储，将x矩阵某一列数据拷贝到GPU时, x列相邻两个元素为 incx =列数

例: float x[5* 6]的二维数组，将第一列数据拷贝给y ，则 n = 5, elemSize = sizeof(float), incx = 6

数据访问的从0做为索引的，计算公式为： 0 + i * incX

incy是对数据拷贝到y中是相邻两个元素在GPU的相距的元素个数

cublasSetMatrix(int rows, int cols, int elemSize,
                const void *A, int lda, void *B, int ldb)

rows, cols, elemSize参数要理解

lda/ldb原文说明为：with the leading dimension of the source matrix A and destination matrix B given in lda and ldb, respectively. The leading dimension indicates the number of rows of the allocated matrix, even if only a submatrix of it is being used

The leading dimension的表述来源于Fortran，lda是指同一行中相邻两个元素在内存相差的个数，由于GPU是按列存储，C/C++是行存储，通常A矩阵保存到B中时，A会先转置成列存储，因此一般情况下lda = rows

例如

C++分配一个二维数组 float A[5][6]，此数据要拷贝到GPU的显存中，由于GPU是以列为由序存储，因此，要把A的内存数据本来以行为主序转换为列为主序存储，转置为:At，仍然5 * 6(注意不是6 * 5，如果看成6 * 5就会对At仍然为行为主序) ，函数调用参数为

cublasSetMatrix(5, 6, At, 5, B, 5);