CUDA 编程:基础与实践的阅读note和program理解 chapter eighteenth cuSolver 库
《CUDA 编程:基础与实践》(樊哲勇)【简介_书评_在线阅读】 - 当当图书 (dangdang.com)
ZouJiu1/CUDA-Programming: Sample codes for my CUDA programming book (github.com)
cuSolver库主要是用来运算更加复杂的线性代数运算,像矩阵求逆,求矩阵的本征值,矩阵对角化或者矩阵分解这些的
厄密矩阵:是指对称的复数矩阵,若是实数矩阵,则退化到了对称矩阵
cusolver.cu 档案内用的是复数矩阵
A = [ 0 − i i 0 ] A=\left[\begin{matrix}0&-i\\i&0\end{matrix}\right] A=[0i−i0]
求本征值也就是特征值,要满足 A x = λ x Ax=\lambda x Ax=λx, ( A − λ E ) x = 0 (A-\lambda E)x=0 (A−λE)x=0 也就是,
∣ A − λ E ∣ = ∣ − λ − i i − λ ∣ = λ 2 + i 2 = λ 2 − 1 |A-\lambda E|=\left|\begin{matrix}-\lambda&-i\\i&-\lambda\end{matrix}\right|=\lambda ^2+i^2=\lambda ^2 - 1 ∣A−λE∣= −λi−i−λ =λ2+i2=λ2−1
所以本征值就是: 1 或者 -1
int N = 2;
int N2 = N * N;
cuDoubleComplex *A_cpu = (cuDoubleComplex *)
malloc(sizeof(cuDoubleComplex) * N2); // 分配复数矩阵内存
for (int n = 0; n < N2; ++n) //对复数矩阵赋值
{
A_cpu[0].x = 0;
A_cpu[1].x = 0;
A_cpu[2].x = 0;
A_cpu[3].x = 0;
A_cpu[0].y = 0;
A_cpu[1].y = 1;
A_cpu[2].y = -1;
A_cpu[3].y = 0;
}
cuDoubleComplex *A; // 定义device的矩阵
CHECK(cudaMalloc((void**)&A, sizeof(cuDoubleComplex) * N2)); // 分配显存的
CHECK(cudaMemcpy(A, A_cpu, sizeof(cuDoubleComplex) * N2, // 数据传输的,host 到 device
cudaMemcpyHostToDevice));
double *W_cpu = (double*) malloc(sizeof(double) * N); // 分配本征值的内存
double *W;
CHECK(cudaMalloc((void**)&W, sizeof(double) * N)); // 分配本征值的显存
cusolverDnHandle_t handle = NULL;
cusolverDnCreate(&handle);
cusolverEigMode_t jobz = CUSOLVER_EIG_MODE_NOVECTOR;
cublasFillMode_t uplo = CUBLAS_FILL_MODE_LOWER;
int lwork = 0;
// 确定运算需要多少缓冲空间
cusolverDnZheevd_bufferSize(handle, jobz, uplo,
N, A, N, W, &lwork);
cuDoubleComplex* work;
CHECK(cudaMalloc((void**)&work,
sizeof(cuDoubleComplex) * lwork)); // 分配缓冲空间显存
int* info;
CHECK(cudaMalloc((void**)&info, sizeof(int))); // 返回值
cusolverDnZheevd(handle, jobz, uplo, N, A, N, W,
work, lwork, info); // 算出本征值的
cudaMemcpy(W_cpu, W, sizeof(double) * N,
cudaMemcpyDeviceToHost);
printf("Eigenvalues are:\n");
for (int n = 0; n < N; ++n)
{
printf("%g\n", W_cpu[n]);
}
cusolverDnDestroy(handle);
free(A_cpu);
free(W_cpu);
CHECK(cudaFree(A));
CHECK(cudaFree(W));
CHECK(cudaFree(work));
CHECK(cudaFree(info));
return 0;