GPU加速计算-工具介绍

最新推荐文章于 2024-06-02 09:56:29 发布

于建民

最新推荐文章于 2024-06-02 09:56:29 发布

阅读量7.9k

点赞数 1

分类专栏：工具文章标签： GPU 加速计算工具 R Python

本文链接：https://blog.csdn.net/yujianmin1990/article/details/48116147

版权

工具专栏收录该内容

10 篇文章 1 订阅

订阅专栏

　　主要在R下和Python下使用GPU加速计算，使用成熟的工具来提高自己的工作效率，现在对其中一些方法做一下简单总结。

R的GPU加速计算包gputools

　　1）gputools，R下的GPU加速计算的函数包，包含常见的运算操作。
　　https://cran.r-project.org/web/packages/gputools/
　　2）iFes，Incremental Feature Selection Algorithm accelerated by GPU。

Python的GPU加速计算包cudamat及gnumpy

　　1）cudamat，在Python里用GPU进行的常规运算基本包含在这个包里，可以满足一般的GPU运算操作。
　　测试样例如下：
　　https://github.com/cudamat/cudamat/blob/master/test/test_cudamat.py
　　2）gnumpy也是个实用GPU计算的包，用了下，有安装问题。
　　3）numba，在Python里用GPU进行编程的包。
　　numba的github源码：https://github.com/numba/numba
　　4）pycuda，是个在Python下进行GPU编程的包，也可以自己编程去实现一些运算，但是基本的加减乘除之类的就不如直接用别人写好的方便了。
　　pycuda的来源：https://developer.nvidia.com/pycuda
　　（或者pypi:https://pypi.python.org/pypi/pycuda/2015.1.3）

Python在GPU下实现的算法

　　在pypi下，搜索gpu，肯定有。已知的有cudatree 0.6（随机森林），dpmix 0.3（高斯混合）。
　　百度/Google下，搜索Python+GPU+algorithm。

补充代码

1. R的gputools测试代码

## Objection: Test GPU Compution Function in R ##
## time: 2015.08.27
## author: yjm
## root@~# nvidia-smi  ## check the gpu information
library(gputools)
help(package = 'gputools')
## we can find the Computing Function using GPU ##
## chooseGpu; ## default=0
chooseGpu(deviceId=0)
############################################
## cpuMatMult (matrix multiplication); 
matA <- matrix(runif(2000*3000), 2000, 3000)
matB <- matrix(runif(3000*4000), 3000, 4000)
t1 = Sys.time()
y1 = cpuMatMult(matA, matB)
t2 = Sys.time()
t2-t1
#### without GPU also
y2 = matA %*% matB
t3 = Sys.time()
t3-t2
y3 = crossprod(t(matA), matB)
t4 = Sys.time()
t4-t3
#############################################
## getGpuId; 
?getGpuId
getGpuId()
## default is device 0
## gpuCor (compute correlation coefficient--"pearson & kendall"); Matrix coef
numAvars <- 5
numBvars <- 10
numSamples <- 30
A <- matrix(runif(numAvars*numSamples), numSamples, numAvars)
B <- matrix(runif(numBvars*numSamples), numSamples, numBvars)
gpuCor(A, B, method="pearson")
gpuCor(A, B, method="kendall")
A[3,2] <- NA
gpuCor(A, B, use="pairwise.complete.obs", method="pearson")
#### without GPU
cor.test(A[,1], B[,1], method='pearson')
#########################################################
## gpuCrossprod (cross-product)
matA <- matrix(runif(3000*2000), 3000, 2000)
matB <- matrix(runif(3000*4000), 3000, 4000)
t1 = Sys.time()
y1 = gpuCrossprod(matA, matB)
t2 = Sys.time()
t2-t1
#### without GPU
y2 = t(matA) %*% matB
t3 = Sys.time()
t3-t2
y3 = crossprod(matA, matB)
t4 = Sys.time()
t4-t3
#############################################################
## gpuDist(matrix, method) (comput distance between vec) each row is a vector
numVectors <- 500
dimension <- 1000
Vectors <- matrix(runif(numVectors*dimension), numVectors, dimension)
t1 = Sys.time()
y1 = gpuDist(Vectors, "euclidean")
t2 = Sys.time()
t2-t1
#### without GPU
y2 = dist(Vectors, "euclidean")
t3 = Sys.time()
t3-t2
#gpuDist(Vectors, "maximum")
#gpuDist(Vectors, "manhattan")
#gpuDist(Vectors, "minkowski", 4)
###################################################
## gpuDistClust
## gpuGlm (glm with gpu)
## gpuGranger (granger causality tests)
## gpuHclust (Hierarchical Clustering)
## gpuLm (lm with gpu)
## gpuLm.defaultTol
## gpuLm.fit 
## gpuLsfit (least sqares fit)
## gpuMatMult (matrix multiplication with GPU)
matA <- matrix(runif(2000*3000), 2000, 3000)
matB <- matrix(runif(3000*4000), 3000, 4000)
t1 = Sys.time()
y1 = gpuMatMult(matA, matB)
t2 = Sys.time()
t2-t1
#### without GPU
y2 = matA %*% matB
t3 = Sys.time()
t3-t2
y3 = crossprod(t(matA), matB)
t4 = Sys.time()
t4-t3
###############################################
## gpuMi () # mutual information computing # each col represent a random variable
x <- matrix(runif(60), 20, 3)
y <- matrix(runif(60), 20, 3)
# do something interesting
y[,2] <- 3.0 * (x[,1] + x[,3])
z <- gpuMi(x, y, bins = 10, splineOrder = 3)
print(z)
## gpuQr (QR decomposition)
## gpuSolve 
## gpuTcrossprod (matrix Transposed Cross-product with GPU)
matA <- matrix(runif(2000*3000), 2000, 3000)
matB <- matrix(runif(4000*3000), 4000, 3000)
t1 = Sys.time()
y1 = gpuTcrossprod(matA, matB)
t2 = Sys.time()
t2 - t1
#### without GPU
y2 = matA %*% t(matB)
t3 = Sys.time()
t3 - t2
y3 = tcrossprod(matA, matB)
t4 = Sys.time()
t4 - t3
## gpuTtest (T-test with gpu)
quit()

## conclusion: 
## when nrow and ncol > 1000, GPU compution is larger quick than cpu.(more 100 times)

2. Python的cudamat包测试代码


# coding: utf-8

# In[179]:
## here want to test cudmat package
## time: 2015.09.07
## author: yjm
## cudamat.py文件里面有作者的注释，可以查看 

#import pycuda as cuda
#import pycuda.autoinit
#from pycuda.compiler import SourceModule
import cudamat as cm
import nose
import numpy as np

# 1)
cm.cublas_init()
#print(cm.CUDAMatrix.ones.shape)
# 2)
#cm.cublas_shutdown() ## 这里面对cm.CUDAMatrix.ones 置零

# In[180]:

## 重置形状 ##
m = 256
n = 128
cm1 = np.array(np.random.rand(n, m)*10, dtype = np.float32, order = 'F')
cm2 = np.array(np.random.rand(m, 1)*10, dtype = np.float32, order = 'F')
gm1 = cm.CUDAMatrix(cm1)
gm2 = cm.CUDAMatrix(cm2)
# maybe a error here in CUDAMatrix 
# soloution ：Try to fix the value "gpu_memory" of your .pbtxt file to "2G" or "2.5G"
print('----显示CPU下各个变量的大小----')
print(cm1.shape)
print(cm2.shape)
print('----显示GPU下各个变量的大小----')
print(gm1.shape)
print(gm2.shape)
print('test reshape in gpu')
gm1.reshape((m, n))
print(gm1.shape)
print('test assign')
#gm2.assign(gm1)
gm1.reshape((n,m))
print(gm1.shape)

# In[81]:

## GPU点乘 ## 以及GPU变量的转置 ##
m = 256
n = 128
cm1 = np.array(np.random.rand(n, m)*10, dtype = np.float32, order = 'F')
cm2 = np.array(np.random.rand(m, n)*10, dtype = np.float32, order = 'F')
gm1 = cm.CUDAMatrix(cm1)
gm2 = cm.CUDAMatrix(cm2)
print(gm1.shape)
print(gm2.shape)
gm = cm.dot(gm1, gm2) ## here is dot on GPU ##
print(gm.shape)
gm = cm.dot(gm2.T, gm1.T) ## here is transpose on GPU ##
print(gm.shape)

# In[63]:

## assign ## ??? what is assign ??? 设置值
cm2 = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='F')
gm2 = cm.CUDAMatrix(cm2)
print('----cm2-----')
print(cm2[1:5, 1:5])
print('----cm1-----')
print(cm1[1:5, 1:5])
gm1.assign(gm2)
gm1.copy_to_host()
print('----after gm1.assign(gm2)----')
gm1.copy_to_host()
print(gm1.numpy_array[1:5, 1:5])

# In[64]:

## assign ##
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order = 'F')
m1 = cm.CUDAMatrix(a)
m1.assign(np.pi)
m1.copy_to_host()
print(m1.numpy_array[1:5, 1:5])

# In[74]:

## 获得行的切片 ## 直接在GPU上进行的切片操作 ##
m = 256
n = 128
start = 11
end = 15

a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(end-start, n)*10, dtype=np.float32, order='F')
c = np.array(a[start:end,:], order='F')

m1 = cm.CUDAMatrix(a)
m2 = cm.CUDAMatrix(b)
print(m1.shape)
print(m2.shape)

m1.get_row_slice(start, end, target = m2)
m3 = m1.get_row_slice(start, end)
m1.copy_to_host()
m2.copy_to_host()
m3.copy_to_host()
print('--after m1.get_row_slice(start, end, target = m2)---')
print(m1.shape)
print(m2.shape)
print(m3.shape)
print(m1.numpy_array[start:end, 1:5])
print(m2.numpy_array[:, 1:5])
print(m3.numpy_array[:, 1:5])

# In[95]:

## 将列向量 对应位置值 加到矩阵的每一列 ## add_col_vec ## 
## sample as following:
## a=[1, 2, 3]; b =[1,1,1; 2,2,2;3,3,3] ## then b.add_col_vec(a) = [2,3,4; 3,4,5; 4,5,6]
## 将行向量 对应位置值 加到矩阵的每一行 ## add_row_vec ##
m = 256
n = 128
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(m, 1)*10, dtype=np.float32, order='F')
c = a + b
m1 = cm.CUDAMatrix(a)
m2 = cm.CUDAMatrix(b)
print('---a-m1[1:5, 1:5]---')
print(a[1:5, 1:5])
print('---b-m2[1:5, :]---')
print(b[1:5, :])
print('---c-m1+m2---')
print(c[1:5, 1:5])

print('----after m1.add_col_ve(m2)----')
#m1.add_col_vec(m2, target = m3) 
## 搞不懂这地方为什么要加target ？？？## 这一句也没有什么作用啊！！！
## soga ！ target 是为了保存结果 ##
m1.add_col_vec(m2)  ## 将m2列向量直接叠加到矩阵m1的每一列 ##
m1.copy_to_host()
print(m1.numpy_array[1:5, 1:5])

# In[103]:

## GPU矩阵所有列数据加上 向量*一个数值 ## 看不懂这个函数有神马意思 ？？？##
m = 256
n = 128
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(m, 1)*10, dtype=np.float32, order='F')
m1 = cm.CUDAMatrix(a)
m2 = cm.CUDAMatrix(b)
print('---a--m1[1:5, 1:5]----')
print(a[1:5, 1:5])
print('---b--m2[1:5, :]----')
print(b[1:5, :])
m1.add_col_mult(m2, np.pi)
m1.copy_to_host()
print(m1.numpy_array[1:5, 1:5])

# In[104]:

## mult_by_row; mult_by_row; div_by_col; div_by_row; ## who knows what's this! ##

# In[175]:

# 求和 ## sum
m = 256
n = 128
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
rowSumRes = np.array(np.random.rand(m, 1)*10, dtype=np.float32, order='F') ## 保存行sum结果##
colSumRes = np.array(np.random.rand(1, n)*10, dtype=np.float32, order='F') ##
m1 = cm.CUDAMatrix(a)
growSumRes = cm.CUDAMatrix(rowSumRes)
gcolSumRes = cm.CUDAMatrix(colSumRes)
print('---a--m1[1:5, 1:5]----')
print(a[1:5, 1:5])
mult = 1 ## 倍数控制 ##
m1.sum(axis = 1, target = growSumRes, mult = mult)
m1.sum(axis = 0, target = gcolSumRes, mult = mult)
growSumRes.copy_to_host()
print(growSumRes.numpy_array[1:5,:])
print(gcolSumRes.numpy_array[:,1:5])

# In[177]:

## 求均值 ## mean
m = 256
n = 128
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
rowMeaRes = np.array(np.random.rand(m, 1)*10, dtype=np.float32, order='F') ## 保存行sum结果##
colMeaRes = np.array(np.random.rand(1, n)*10, dtype=np.float32, order='F') ##
m1 = cm.CUDAMatrix(a)
growMeaRes = cm.CUDAMatrix(rowSumRes)
gcolMeaRes = cm.CUDAMatrix(colSumRes)
print('---a--m1[1:5, 1:5]----')
print(a[1:5, 1:5])
mult = 1 ## 倍数控制 ##
m1.mean(axis = 1, target = growMeaRes)
m1.mean(axis = 0, target = gcolMeaRes)
growMeaRes.copy_to_host()
print(growMeaRes.numpy_array[1:5, :])
print(gcolMeaRes.numpy_array[:, 1:5])

# In[181]:

## 最大值最小值 ## max/min
## max(axis, target = None)
## min(axis, target = None)
## 跟求和sum 和 求均值mean 一样用法。
## sign 函数是个什么东东 ？？？##

# In[185]:

## sigmoid 函数 ## apply_sigmoid(target=None)
m = 256
n = 128
a = np.array(np.random.randn(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.randn(m, n)*10, dtype=np.float32, order='F')

c = 1. / (1. + np.exp(-a))

m1 = cm.CUDAMatrix(a)
m2 = cm.CUDAMatrix(b)
m1.apply_sigmoid(target = m2)
m1.apply_sigmoid()

m1.copy_to_host()
m2.copy_to_host()
print(m1.numpy_array[1:5, 1:5])
print(m2.numpy_array[1:5, 1:5])
print(c[1:5, 1:5])

## 双曲正切函数 ## tanh ##
## gm.apply_tanh(target = gm2)
## gm.apply_tanh()
## 软阈值函数 ## soft_threshold ##

## log 函数 ## exp 函数 ## sqrt函数 ## 指数函数 ## where 函数 ##
## log(mat, target = None)
## exp(mat, target = None)
## pow(mat, p, target = None)
## where(condition_mat, if_mat, else_mat, target=None)