cuBLAS中的Level-1函数nrm2()和rot()
2.5.7. cublas<t>nrm2()
cublasStatus_t cublasSnrm2(cublasHandle_t handle, int n,
const float *x, int incx, float *result)
cublasStatus_t cublasDnrm2(cublasHandle_t handle, int n,
const double *x, int incx, double *result)
cublasStatus_t cublasScnrm2(cublasHandle_t handle, int n,
const cuComplex *x, int incx, float *result)
cublasStatus_t cublasDznrm2(cublasHandle_t handle, int n,
const cuDoubleComplex *x, int incx, double *result)
此函数计算向量 x 的欧几里得范数。 该代码使用累积的多阶段模型来避免中间下溢和上溢,结果等价于 ∑ i = 1 n ( x [ j ] × x [ j ] ) 其中 j = 1 + ( i - 1 ) * incx in 精确的算术。 请注意,最后一个等式反映了用于与 Fortran 兼容的基于 1 的索引。
Param. | Memory | In/out | Meaning |
---|---|---|---|
handle | input | handle to the cuBLAS library context. | |
n | input | number of elements in the vector x. | |
x | device | input | <type> vector with n elements. |
incx | input | stride between consecutive elements of x. | |
result | host or device | output | the resulting dot product, which is 0.0 if n<=0. |
该函数可能返回的错误值及其含义如下所列。
Error Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | 操作成功完成 |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_ALLOC_FAILED | 无法分配缩减缓冲区 |
CUBLAS_STATUS_EXECUTION_FAILED | 该功能无法在 GPU 上启动 |
2.5.8. cublas<t>rot()
cublasStatus_t cublasSrot(cublasHandle_t handle, int n,
float *x, int incx,
float *y, int incy,
const float *c, const float *s)
cublasStatus_t cublasDrot(cublasHandle_t handle, int n,
double *x, int incx,
double *y, int incy,
const double *c, const double *s)
cublasStatus_t cublasCrot(cublasHandle_t handle, int n,
cuComplex *x, int incx,
cuComplex *y, int incy,
const float *c, const cuComplex *s)
cublasStatus_t cublasCsrot(cublasHandle_t handle, int n,
cuComplex *x, int incx,
cuComplex *y, int incy,
const float *c, const float *s)
cublasStatus_t cublasZrot(cublasHandle_t handle, int n,
cuDoubleComplex *x, int incx,
cuDoubleComplex *y, int incy,
const double *c, const cuDoubleComplex *s)
cublasStatus_t cublasZdrot(cublasHandle_t handle, int n,
cuDoubleComplex *x, int incx,
cuDoubleComplex *y, int incy,
const double *c, const double *s)
此函数应用 Givens 旋转矩阵(即,在 x,y 平面中逆时针旋转由 cos(alpha)=c, sin(alpha)=s 定义的角度):
G = c s - s c
到向量 x 和 y。
因此,结果是 x [ k ] = c × x [ k ] + s × y [ j ] 和 y [ j ] = - s × x [ k ] + c × y [ j ] 其中 k = 1 + ( i - 1 ) * incx 和 j = 1 + ( i - 1 ) * incy 。 请注意,最后两个等式反映了用于与 Fortran 兼容的基于 1 的索引。
Param. | Memory | In/out | Meaning |
---|---|---|---|
handle | input | handle to the cuBLAS library context. | |
n | input | number of elements in the vector x. | |
x | device | input | <type> vector with n elements. |
incx | input | stride between consecutive elements of x. | |
y | device | in/out | <type> vector with n elements. |
incy | input | stride between consecutive elements of y. | |
c | host or device | input | cosine element of the rotation matrix. |
s | host or device | input | sine element of the rotation matrix. |
该函数可能返回的错误值及其含义如下所列。
Error Value | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | 操作成功完成 |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_EXECUTION_FAILED | 该功能无法在 GPU 上启动 |