cuBALS使用(5)-cublasXt

cuBLAS的cuBLASXt API提供支持多GPU的主机接口:当使用该API时,应用程序仅需要在主机存储器空间上分配所需的矩阵。矩阵的大小没有限制,只要它们可以放入主机存储器即可。cuBLASXt API负责在指定的GPU之间分配内存,并在它们之间分派工作负载,最后将结果检索回主机。cuBLASXt API仅支持计算密集型BLAS 3例程(例如矩阵-矩阵运算),在这些例程中,来自GPU的PCI来回传输可以分摊。cuBLASXt API有自己的头文件cublasXt.h。
从8.0版开始,cuBLASXt API允许将任何矩阵放置在GPU设备上。
注意:cuBLASXt API仅在64位平台上受支持。

平铺设计方法

为了能够在多个GPU之间分担工作负载,cuBLASXt API使用了一种平铺策略:每个矩阵被分成用户可控尺寸BlockDim × BlockDim的正方形块。生成的矩阵平铺定义静态调度策略:每一所得瓦片以循环方式作用于GPU。每一GPU创建一个CPU线程,且所述CPU线程负责进行适当的存储器传送和CUBLAS操作以计算其所负责的所有瓦片。从性能的角度来看,由于这种静态调度策略,最好每个GPU的计算能力和PCI带宽都相同。下图说明了3个GPU之间的图块分布。为了从C计算第一瓦片G0,负责GPU0的CPU线程0必须以管线方式加载来自A的第一行的3个瓦片和来自B的第一列的瓦片,以便重叠存储器传送和计算,且在移动到下一瓦片G0之前将结果求和到C的第一瓦片G0中。

 当块尺寸不是C的尺寸的精确倍数时,一些块在右边界或/和底部边界上被部分填充。当前的实现不会填充不完整的图块,而是通过执行正确的简化cuBLAS操作来跟踪这些不完整的图块:这样就不会进行额外的计算。然而,当所有GPU没有相同数量的不完整瓦片工作时,它仍然可以导致一些负载不平衡。
当一个或多个矩阵位于某些GPU设备上时,应用相同的平铺方法和工作负载共享。在这种情况下,存储器传输在设备之间进行。然而,当图块的计算和一些数据位于同一GPU设备上时,绕过将本地数据传输到图块或从本地数据传输到图块的存储器传输,并且GPU直接对本地数据进行操作。这可以显著提高性能,尤其是在仅使用一个GPU进行计算时。
矩阵可以位于任何GPU设备上,并且不必位于同一GPU设备上。此外,矩阵甚至可以位于不参与计算的GPU设备上。
与cuBLAS API相反,即使所有矩阵都位于同一设备上,从主机的角度来看,cuBLASXt API仍然是一个阻塞API:无论位于何处的数据结果在呼叫返回时都将是有效的,并且不需要设备同步。

Hybrid CPU-GPU computation

在出现非常大的问题时,cuBLASXt API可以将部分计算卸载到主机CPU。此功能可通过cublasXtSetCpuRoutine()和cublasXtSetCpuRatio()例程设置。影响CPU的工作负载被搁置:它仅仅是从底部和右侧取的所得矩阵的百分比,无论哪个维度较大。GPU平铺是在这之后在减少的结果矩阵上完成的。
如果任何矩阵位于GPU设备上,则将忽略该功能,并且所有计算都将仅在GPU上完成
应谨慎使用此功能,因为它可能会干扰负责为GPU提供数据的CPU线程。
目前,只有cublasXt<t>gemm()例程支持此特性。

Results reproducibility

当前,给定工具包版本中的所有CUBLAS XT API例程在满足以下条件时生成相同的按位结果:

  • 参与计算的所有GPU具有相同的计算能力和相同数量的SM。
  • 在运行之间块尺寸保持相同。
  • 或者不使用CPU混合计算,或者也保证所提供的CPUBlas产生可再现的结果。

 cuBLASXt API数据类型

cublasXtHandle_ t

 cublasXtHandle_t是指向保存cuBLASXt API上下文的不透明结构的指针类型。必须使用以下命令初始化cublasXtHandle_t 并且返回的句柄必须传递给所有后续的cuBLASXt API函数调用。上下文应在最后使用cublasXtDestroy()​​​​​​​ 。

 cublasXtOpType_t

该cublasOptype_t枚举了四种可能得类型,此枚举用作里程的参数cublasXtSetCpuRotine和cublasXtSetCpuRation建立对应额混合的配置。

Value

Meaning

CUBLASXT_FLOAT

浮点或单精度类型

CUBLASXT_DOUBLE

双精度类型

CUBLASXT_COMPLEX

单精度复数

CUBLASXT_DOUBLECOMPLEX

双精度复数

cublasXtBlasOp_t

该 cublasXtBlasOp_t 类型列举了由cuBLASXt API支持的BLAS3或类BLAS3程序。此枚举用作例程的参数cublasXtSetCpuRoutine 以及 cublasXtSetCpuRoutine 以建立混合配

cublasXtSetCpuRoutine

Value

Meaning

CUBLASXT_GEMM

GEMM routine

CUBLASXT_SYRK

SYRK routine

CUBLASXT_HERK

HERK routine

CUBLASXT_SYMM

SYMM routine

CUBLASXT_HEMM

HEMM routine

CUBLASXT_TRSM

TRSM routine

CUBLASXT_SYR2K

SYR2K routine

CUBLASXT_HER2K

HER2K routine

CUBLASXT_SPMM

SPMM routine

CUBLASXT_SYRKX

SYRKX routine

CUBLASXT_HERKX

HERKX routine

cublasXtPinningMemMode_t

该类型用于通过例程启用或禁用固定存储器模式cubasMgSetPinningMemMode

Value

Meaning

CUBLASXT_PINNING_DISABLED

the Pinning Memory mode is disabled

CUBLASXT_PINNING_ENABLED

the Pinning Memory mode is enabled

cuBLASXt API Helper Function Reference

cublasXtCreate()

cublasStatus_t
cublasXtCreate(cublasXtHandle_t *handle)

此函数用于初始化cuBLASXt API,并为保存cuBLASXt API上下文的不透明结构创建句柄。它分配主机和设备上的硬件资源,必须在进行任何其他cuBLASXt API调用之前调用。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the initialization succeeded

CUBLAS_STATUS_ALLOC_FAILED

the resources could not be allocated

CUBLAS_STATUS_NOT_SUPPORTED

cuBLASXt API is only supported on 64-bit platform

 cublasXtDestroy()

cublasStatus_t
cublasXtDestroy(cublasXtHandle_t handle)

此函数用于释放cuBLASXt API上下文使用的硬件资源。GPU资源的释放可以被延迟直到应用退出。此函数通常是对cuBLASXt API的最后一次调用,具有特定句柄。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the shut down succeeded

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

 cublasXtDeviceSelect()

cublasXtDeviceSelect(cublasXtHandle_t handle, int nbDevices, int deviceId[])

此函数允许用户提供将参与后续cuBLASXt API数学函数调用的GPU设备数量及其各自的ID。此函数将为列表中提供的每个GPU创建一个cuBLAS上下文。当前设备配置是静态的,不能在Math函数调用之间更改。在这方面,此函数应仅在cublasXtCreate之后调用一次。为了能够运行多个配置,应创建多个cuBLASXt API上下文。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

User call was sucessful

CUBLAS_STATUS_INVALID_VALUE

Access to at least one of the device could not be done or a cuBLAS context could not be created on at least one of the device

CUBLAS_STATUS_ALLOC_FAILED

Some resources could not be allocated.

 cublasXtSetBlockDim()

cublasXtSetBlockDim(cublasXtHandle_t handle, int blockDim)

此函数允许用户设置用于后续Math函数调用的矩阵平铺的块维数。矩阵拆分为blockDim x blockDim维度的正方形块。此函数可随时调用,并将对以下Math函数调用生效。块尺寸的选择应优化数学运算,并确保PCI传输与计算良好重叠。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

blockDim <= 0

cublasXtGetBlockDim()

cublasXtGetBlockDim(cublasXtHandle_t handle, int *blockDim)

此函数允许用户查询用于矩阵平铺的块尺寸。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

cublasXtSetCpuRoutine()

cublasXtSetCpuRoutine(cublasXtHandle_t handle, cublasXtBlasOp_t blasOp, cublasXtOpType_t type, void *blasFunctor)

此函数允许用户提供相应BLAS例程的CPU实现。此函数可与cublasXtSetCpuRatio()函数一起使用,以定义CPU和GPU之间的混合计算。目前,仅xGEMM例程支持混合功能。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

blasOp or type define an invalid combination

CUBLAS_STATUS_NOT_SUPPORTED

CPU-GPU Hybridization for that routine is not supported

cublasXtSetCpuRatio()

cublasXtSetCpuRatio(cublasXtHandle_t handle, cublasXtBlasOp_t blasOp, cublasXtOpType_t type, float ratio )

此函数允许用户定义在混合计算环境中应在CPU上完成的工作负载百分比。此函数可与cublasXtSetCpuRoutine()函数一起使用,以定义CPU和GPU之间的混合计算。目前,仅xGEMM例程支持混合功能。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

blasOp or type define an invalid combination

CUBLAS_STATUS_NOT_SUPPORTED

CPU-GPU Hybridization for that routine is not supported

 cublasXtSetPinningMemMode()

cublasXtSetPinningMemMode(cublasXtHandle_t handle, 
                          cublasXtPinningMemMode_t mode)

此功能允许用户启用或禁用固定的储器模式。启用后,如果矩阵尚未固定,则将分别使用CUDART例程cudaHostRegister和cudaHostUnregister固定/取消固定后续cuBLASXt API调用中传递的矩阵。如果矩阵碰巧被部分固定,则它也不会被固定。固定内存可提高PCI传输性能,并允许PCI内存传输与计算重叠。然而,固定/取消固定内存需要一些时间,这可能不会摊销。建议用户使用cudaMallocHost或cudaHostRegister自行固定存储器,并在计算序列完成时将其解锁。默认情况下,“固定内存”模式处于禁用状态。

当用于不同cuBLASXt API调用的矩阵重叠时,不应启用固定内存模式。如果使用cudaHostGetFlags固定了矩阵的第一个地址,则cuBLASXt确定该矩阵是否固定,因此无法知道该矩阵是否已经部分固定。这在多线程应用程序中尤其如此,在多线程应用程序中,当另一个线程正在访问内存时,内存可能会部分或全部被固定或取消固定。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

CUBLAS_STATUS_INVALID_VALUE

the mode value is different from CUBLASXT_PINNING_DISABLED and CUBLASXT_PINNING_ENABLED

 cublasXtGetPinningMemMode()

cublasXtGetPinningMemMode(cublasXtHandle_t handle,
                         cublasXtPinningMemMode_t *mode)

此功能允许用户查询引固定储器模式。默认情况下,“固定内存”模式处于禁用状态。

Return Value

Meaning

CUBLAS_STATUS_SUCCESS

the call has been successful

cuBLASXt API Math Functions Reference

在本章中,我们将介绍cuBLASXt API支持的实际Linear Agebra例程。我们将<type>使用type和<t>相应的短类型的缩写,以便更简洁和清楚地表示所实现的函数。除非另有说明<type>,<t>具有下列含义:

<type>

<t>

Meaning

float

‘s’ or ‘S’

real single-precision

double

‘d’ or ‘D’

real double-precision

cuComplex

‘c’ or ‘C’

complex single-precision

cuDoubleComplex

‘z’ or ‘Z’

complex double-precision

 cublasXt<t>gemm()

cublasStatus_t cublasXtSgemm(cublasXtHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           size_t m, size_t n, size_t k,
                           const float           *alpha,
                           const float           *A, int lda,
                           const float           *B, int ldb,
                           const float           *beta,
                           float           *C, int ldc)
cublasStatus_t cublasXtDgemm(cublasXtHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const double          *alpha,
                           const double          *A, int lda,
                           const double          *B, int ldb,
                           const double          *beta,
                           double          *C, int ldc)
cublasStatus_t cublasXtCgemm(cublasXtHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const cuComplex       *alpha,
                           const cuComplex       *A, int lda,
                           const cuComplex       *B, int ldb,
                           const cuComplex       *beta,
                           cuComplex       *C, int ldc)
cublasStatus_t cublasXtZgemm(cublasXtHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, int lda,
                           const cuDoubleComplex *B, int ldb,
                           const cuDoubleComplex *beta,
                           cuDoubleComplex *C, int ldc)

此函数执行矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

transa

input

operation op(A) that is non- or (conj.) transpose.

transb

input

operation op(B) that is non- or (conj.) transpose.

m

input

number of rows of matrix op(A) and C.

n

input

number of columns of matrix op(B) and C.

k

input

number of columns of op(A) and rows of op(B).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimensions lda x k with lda>=max(1,m) if transa == CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store the matrix A.

B

host or device

input

<type> array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication. If beta==0C does not have to be a valid input.

C

host or device

in/out

<type> array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of a two-dimensional array used to store the matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXt<t>hemm()

cublasStatus_t cublasXtChemm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           size_t m, size_t n,
                           const cuComplex       *alpha,
                           const cuComplex       *A, size_t lda,
                           const cuComplex       *B, size_t ldb,
                           const cuComplex       *beta,
                           cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZhemm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           size_t m, size_t n,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, size_t lda,
                           const cuDoubleComplex *B, size_t ldb,
                           const cuDoubleComplex *beta,
                           cuDoubleComplex *C, size_t ldc)

此函数执行厄米特矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix C and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and B, with matrix A sized accordingly.

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x m with lda>=max(1,m) if side==CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise. The imaginary parts of the diagonal elements are assumed to be zero.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimensions ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXt<t>symm()

cublasStatus_t cublasXtSsymm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           size_t m, size_t n,
                           const float           *alpha,
                           const float           *A, size_t lda,
                           const float           *B, size_t ldb,
                           const float           *beta,
                           float           *C, size_t ldc)
cublasStatus_t cublasXtDsymm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           size_t m, size_t n,
                           const double          *alpha,
                           const double          *A, size_t lda,
                           const double          *B, size_t ldb,
                           const double          *beta,
                           double          *C, size_t ldc)
cublasStatus_t cublasXtCsymm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           size_t m, size_t n,
                           const cuComplex       *alpha,
                           const cuComplex       *A, size_t lda,
                           const cuComplex       *B, size_t ldb,
                           const cuComplex       *beta,
                           cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZsymm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           size_t m, size_t n,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, size_t lda,
                           const cuDoubleComplex *B, size_t ldb,
                           const cuDoubleComplex *beta,
                           cuDoubleComplex *C, size_t ldc)

此函数执行对称矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix A and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and A, with matrix A sized accordingly.

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication, if beta == 0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXt<t>syrk()

cublasStatus_t cublasXtSsyrk(cublasXtHandle_t handle,
                           cublasFillMode_t uplo, cublasOperation_t trans,
                           int n, int k,
                           const float           *alpha,
                           const float           *A, int lda,
                           const float           *beta,
                           float           *C, int ldc)
cublasStatus_t cublasXtDsyrk(cublasXtHandle_t handle,
                           cublasFillMode_t uplo, cublasOperation_t trans,
                           int n, int k,
                           const double          *alpha,
                           const double          *A, int lda,
                           const double          *beta,
                           double          *C, int ldc)
cublasStatus_t cublasXtCsyrk(cublasXtHandle_t handle,
                           cublasFillMode_t uplo, cublasOperation_t trans,
                           int n, int k,
                           const cuComplex       *alpha,
                           const cuComplex       *A, int lda,
                           const cuComplex       *beta,
                           cuComplex       *C, int ldc)
cublasStatus_t cublasXtZsyrk(cublasXtHandle_t handle,
                           cublasFillMode_t uplo, cublasOperation_t trans,
                           int n, int k,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, int lda,
                           const cuDoubleComplex *beta,
                           cuDoubleComplex *C, int ldc)

此函数执行对称秩- K

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix C lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A) and C.

k

input

number of columns of matrix op(A).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x k with lda>=max(1,n) if trans == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

beta

host

input

<type> scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimension ldc x n, with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXt<t>syr2k()

cublasStatus_t cublasXtSsyr2k(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const float           *alpha,
                            const float           *A, size_t lda,
                            const float           *B, size_t ldb,
                            const float           *beta,
                            float           *C, size_t ldc)
cublasStatus_t cublasXtDsyr2k(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const double          *alpha,
                            const double          *A, size_t lda,
                            const double          *B, size_t ldb,
                            const double          *beta,
                            double          *C, size_t ldc)
cublasStatus_t cublasXtCsyr2k(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuComplex       *alpha,
                            const cuComplex       *A, size_t lda,
                            const cuComplex       *B, size_t ldb,
                            const cuComplex       *beta,
                            cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZsyr2k(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuDoubleComplex *alpha,
                            const cuDoubleComplex *A, size_t lda,
                            const cuDoubleComplex *B, size_t ldb,
                            const cuDoubleComplex *beta,
                            cuDoubleComplex *C, size_t ldc)

This function performs the symmetric rank- 2 update

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix C lower or upper part, is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimensions ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication, if beta==0, then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimensions ldc x n with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXt<t>syrkx()

cublasStatus_t cublasXtSsyrkx(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const float           *alpha,
                            const float           *A, size_t lda,
                            const float           *B, size_t ldb,
                            const float           *beta,
                            float           *C, size_t ldc)
cublasStatus_t cublasXtDsyrkx(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const double          *alpha,
                            const double          *A, size_t lda,
                            const double          *B, size_t ldb,
                            const double          *beta,
                            double          *C, size_t ldc)
cublasStatus_t cublasXtCsyrkx(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuComplex       *alpha,
                            const cuComplex       *A, size_t lda,
                            const cuComplex       *B, size_t ldb,
                            const cuComplex       *beta,
                            cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZsyrkx(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuDoubleComplex *alpha,
                            const cuDoubleComplex *A, size_t lda,
                            const cuDoubleComplex *B, size_t ldb,
                            const cuDoubleComplex *beta,
                            cuDoubleComplex *C, size_t ldc)

This function performs a variation of the symmetric rank- k update

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix C lower or upper part, is stored, the other symmetric part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimensions ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication, if beta==0, then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimensions ldc x n with ldc>=max(1,n).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXt<t>herk()

cublasStatus_t cublasXtCherk(cublasXtHandle_t handle,
                           cublasFillMode_t uplo, cublasOperation_t trans,
                           int n, int k,
                           const float  *alpha,
                           const cuComplex       *A, int lda,
                           const float  *beta,
                           cuComplex       *C, int ldc)
cublasStatus_t cublasXtZherk(cublasXtHandle_t handle,
                           cublasFillMode_t uplo, cublasOperation_t trans,
                           int n, int k,
                           const double *alpha,
                           const cuDoubleComplex *A, int lda,
                           const double *beta,
                           cuDoubleComplex *C, int ldc)

This function performs the Hermitian rank- K update

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A) and C.

k

input

number of columns of matrix op(A).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

beta

host

input

<type> scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXt<t>her2k()

cublasStatus_t cublasXtCher2k(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuComplex       *alpha,
                            const cuComplex       *A, size_t lda,
                            const cuComplex       *B, size_t ldb,
                            const float  *beta,
                            cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZher2k(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuDoubleComplex *alpha,
                            const cuDoubleComplex *A, size_t lda,
                            const cuDoubleComplex *B, size_t ldb,
                            const double *beta,
                            cuDoubleComplex *C, size_t ldc)

此函数执行埃尔米特秩- 2 更新

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimension ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXt<t>herkx()

cublasStatus_t cublasXtCherkx(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuComplex       *alpha,
                            const cuComplex       *A, size_t lda,
                            const cuComplex       *B, size_t ldb,
                            const float  *beta,
                            cuComplex       *C, size_t ldc)
cublasStatus_t cublasXtZherkx(cublasXtHandle_t handle,
                            cublasFillMode_t uplo, cublasOperation_t trans,
                            size_t n, size_t k,
                            const cuDoubleComplex *alpha,
                            const cuDoubleComplex *A, size_t lda,
                            const cuDoubleComplex *B, size_t ldb,
                            const double *beta,
                            cuDoubleComplex *C, size_t ldc)

这个函数执行埃尔米特秩-x更新

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

uplo

input

indicates if matrix A lower or upper part is stored, the other Hermitian part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

n

input

number of rows of matrix op(A), op(B) and C.

k

input

number of columns of matrix op(A) and op(B).

alpha

host

input

<type> scalar used for multiplication.

A

host or device

input

<type> array of dimension lda x k with lda>=max(1,n) if transa == CUBLAS_OP_N and lda x n with lda>=max(1,k) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimension ldb x k with ldb>=max(1,n) if transb == CUBLAS_OP_N and ldb x n with ldb>=max(1,k) otherwise.

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

real scalar used for multiplication, if beta==0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimension ldc x n, with ldc>=max(1,n). The imaginary parts of the diagonal elements are assumed and set to zero.

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters n,k<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXt<t>trsm()

cublasStatus_t cublasXtStrsm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasXtDiagType_t diag,
                           size_t m, size_t n,
                           const float           *alpha,
                           const float           *A, size_t lda,
                           float           *B, size_t ldb)
cublasStatus_t cublasXtDtrsm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasXtDiagType_t diag,
                           size_t m, size_t n,
                           const double          *alpha,
                           const double          *A, size_t lda,
                           double          *B, size_t ldb)
cublasStatus_t cublasXtCtrsm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasXtDiagType_t diag,
                           size_t m, size_t n,
                           const cuComplex       *alpha,
                           const cuComplex       *A, size_t lda,
                           cuComplex       *B, size_t ldb)
cublasStatus_t cublasXtZtrsm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasXtDiagType_t diag,
                           size_t m, size_t n,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, size_t lda,
                           cuDoubleComplex *B, size_t ldb)

This function solves the triangular linear system with multiple right-hand-sides

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of X.

uplo

input

indicates if matrix A lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A are unity and should not be accessed.

m

input

number of rows of matrix B, with matrix A sized accordingly.

n

input

number of columns of matrix B, with matrix A is sized accordingly.

alpha

host

input

<type> scalar used for multiplication, if alpha==0 then A is not referenced and B does not have to be a valid input.

A

host or device

input

<type> array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

in/out

<type> array. It has dimensions ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

 cublasXt<t>trmm()

cublasStatus_t cublasXtStrmm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasDiagType_t diag,
                           size_t m, size_t n,
                           const float           *alpha,
                           const float           *A, size_t lda,
                           const float           *B, size_t ldb,
                           float                 *C, size_t ldc)
cublasStatus_t cublasXtDtrmm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasDiagType_t diag,
                           size_t m, size_t n,
                           const double          *alpha,
                           const double          *A, size_t lda,
                           const double          *B, size_t ldb,
                           double                *C, size_t ldc)
cublasStatus_t cublasXtCtrmm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasDiagType_t diag,
                           size_t m, size_t n,
                           const cuComplex       *alpha,
                           const cuComplex       *A, size_t lda,
                           const cuComplex       *B, size_t ldb,
                           cuComplex             *C, size_t ldc)
cublasStatus_t cublasXtZtrmm(cublasXtHandle_t handle,
                           cublasSideMode_t side, cublasFillMode_t uplo,
                           cublasOperation_t trans, cublasDiagType_t diag,
                           size_t m, size_t n,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, size_t lda,
                           const cuDoubleComplex *B, size_t ldb,
                           cuDoubleComplex       *C, size_t ldc)

此函数执行三角矩阵-矩阵乘法

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other part is not referenced and is inferred from the stored elements.

trans

input

operation op(A) that is non- or (conj.) transpose.

diag

input

indicates if the elements on the main diagonal of matrix A are unity and should not be accessed.

m

input

number of rows of matrix B, with matrix A sized accordingly.

n

input

number of columns of matrix B, with matrix A sized accordingly.

alpha

host

input

<type> scalar used for multiplication, if alpha==0 then A is not referenced and B does not have to be a valid input.

A

host or device

input

<type> array of dimension lda x m with lda>=max(1,m) if side == CUBLAS_SIDE_LEFT and lda x n with lda>=max(1,n) otherwise.

lda

input

leading dimension of two-dimensional array used to store matrix A.

B

host or device

input

<type> array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

C

host or device

in/out

<type> array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

cublasXt<t>spmm()

cublasStatus_t cublasXtSspmm( cublasXtHandle_t handle,
                                cublasSideMode_t side,
                                cublasFillMode_t uplo,
                                size_t m,
                                size_t n,
                                const float *alpha,
                                const float *AP,
                                const float *B,
                                size_t ldb,
                                const float *beta,
                                float *C,
                                size_t ldc );

cublasStatus_t cublasXtDspmm( cublasXtHandle_t handle,
                                cublasSideMode_t side,
                                cublasFillMode_t uplo,
                                size_t m,
                                size_t n,
                                const double *alpha,
                                const double *AP,
                                const double *B,
                                size_t ldb,
                                const double *beta,
                                double *C,
                                size_t ldc );

cublasStatus_t cublasXtCspmm( cublasXtHandle_t handle,
                                cublasSideMode_t side,
                                cublasFillMode_t uplo,
                                size_t m,
                                size_t n,
                                const cuComplex *alpha,
                                const cuComplex *AP,
                                const cuComplex *B,
                                size_t ldb,
                                const cuComplex *beta,
                                cuComplex *C,
                                size_t ldc );

cublasStatus_t cublasXtZspmm( cublasXtHandle_t handle,
                                cublasSideMode_t side,
                                cublasFillMode_t uplo,
                                size_t m,
                                size_t n,
                                const cuDoubleComplex *alpha,
                                const cuDoubleComplex *AP,
                                const cuDoubleComplex *B,
                                size_t ldb,
                                const cuDoubleComplex *beta,
                                cuDoubleComplex *C,
                                size_t ldc );
This function performs the symmetric packed matrix-matrix multiplication

 

Param.

Memory

In/out

Meaning

handle

input

handle to the cuBLASXt API context.

side

input

indicates if matrix A is on the left or right of B.

uplo

input

indicates if matrix A lower or upper part is stored, the other symmetric part is not referenced and is inferred from the stored elements.

m

input

number of rows of matrix A and B, with matrix A sized accordingly.

n

input

number of columns of matrix C and A, with matrix A sized accordingly.

alpha

host

input

<type> scalar used for multiplication.

AP

host

input

<type> array with � stored in packed format.

B

host or device

input

<type> array of dimension ldb x n with ldb>=max(1,m).

ldb

input

leading dimension of two-dimensional array used to store matrix B.

beta

host

input

<type> scalar used for multiplication, if beta == 0 then C does not have to be a valid input.

C

host or device

in/out

<type> array of dimension ldc x n with ldc>=max(1,m).

ldc

input

leading dimension of two-dimensional array used to store matrix C.

The possible error values returned by this function and their meanings are listed below.

Error Value

Meaning

CUBLAS_STATUS_SUCCESS

the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED

the library was not initialized

CUBLAS_STATUS_INVALID_VALUE

the parameters m,n<0

CUBLAS_STATUS_NOT_SUPPORTED

the matrix AP is located on a GPU device

CUBLAS_STATUS_EXECUTION_FAILED

the function failed to launch on the GPU

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值