FFT快速卷积GPU加速----pycuda

最新推荐文章于 2024-05-14 09:32:01 发布

Yellow Yi

最新推荐文章于 2024-05-14 09:32:01 发布

阅读量2.8k

点赞数 1

分类专栏： pycuda 文章标签： fft gpu numpy 卷积定理 pucuda

本文链接：https://blog.csdn.net/fupotui7870/article/details/77989008

版权

pycuda 专栏收录该内容

6 篇文章 1 订阅

订阅专栏

二维卷积定理的实现

1.读取一幅图f(x,y)大小A*B ,一个卷积核h(x,y)大小为C*D。
2.对f,h,进行填充，填充成Q*P，P>=A+C-1,Q>=B+D-1。（一定要进行填充解决做TTF时由周期性引起的缠绕问题的数据错误及数据丢失）
3.分别对f,h做傅里叶变换（FFT2）
4.G=F*H（注意是乘法不是点乘）
5.对G做IFFT2
6.对5中的结果进行剪切得到原大小的图片。

直接看代码吧！

#-*- coding: utf-8 -*-
import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule
from jinja2 import Template
from reikna.fft import FFT
import reikna.cluda as cluda
import numpy as np
import os
import matplotlib.pyplot as plt
import pylab
from scipy import signal
import time



# ==========================================================
def Paddedsize(arr,A,B,P,Q):
    pad = np.empty(P*Q).reshape(P,Q)
    pad = np.complex128(pad)
    i = 0
    j = 0
    for i in range(P):
        for j in range(Q):
            if (i<=A-1)and(j<=B-1):
                pad[i][j] = arr[i][j]
            else:
                pad[i][j] = 0

    return pad
# ==========================================================
def reikna_fft(arr, P, Q):
    api = cluda.cuda_api()
    thr = api.Thread.create()    
    x = thr.to_device(arr)
    X = thr.array((P,Q), dtype=np.complex128)
    fft = FFT(x)
    fftc = fft.compile(thr)
    fftc(X, x, 0)
    fft = X.get()
    thr.release()
    return fft
# ==========================================================
def test_convolution_cuda():
    path = os.getcwd()
    img = plt.imread(path + "/rose.tif")
    plt.imshow(img)
    pylab.show()
    mats_1 = np.complex128(img)
    mats_2 = np.array((
                         [[1,0,-1], 
                          [2,0,-2],
                          [1,0,-1] 
                      ]), dtype=np.complex128)

    R1 = mats_1.shape[0]
    R2 = mats_2.shape[0]
    L1 = mats_1.shape[1]
    L2 = mats_2.shape[1]
    P = R1 + R2 
    Q = L1 + L2

    start = time.time()
    mats_1 = Paddedsize(mats_1, R1, L1, P, Q)
    mats_2 = Paddedsize(mats_2, R2, L2, P, Q)
    end = time.time()
    print (end- start)

    start = time.time()
    xfft1 = reikna_fft(mats_1,P,Q)
    xfft2 = reikna_fft(mats_2,P,Q)
    end = time.time()
    print (end- start)

    start = time.time()
    result =  xfft1*xfft2
    end = time.time()
    print (end- start)

    aa = result.conjugate() 
    xifft = reikna_fft(aa,P,Q)
    xifft = xifft/(P*Q)
    out = xifft.real
    out = out[1:R1+1,1:L1+1]   
    plt.imshow(out)
    pylab.show()

# ==========================================================
if __name__ == '__main__':
    start = time.time()
    test_convolution_cuda()
    end = time.time()
    print (end- start)
# ==========================================================

这里的知识点我在前几个帖子里介绍过，有兴趣的可以看一看瞧一瞧。
看看效果图:
原图：
这里写图片描述
卷积之后的图：

总结

从效果上来看，用reikna做的卷积还是不错的，但是从时间上来看是失败的，就上面的程序耗时间需要5S。具体就是reikna的TTF耗时比较久，看来还是需要自己写FFT程序，不能偷懒。
有大神看到希望给出点建议，谢谢！

Yellow Yi

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
FFT快速卷积GPU加速----pycuda

二维卷积定理的实现1.读取一幅图f(x,y)大小A*B ,一个卷积核h(x,y)大小为C*D。 2.对f,h,进行填充，填充成Q*P，P>=A+C-1,Q>=B+D-1。 3.分别对f,h做傅里叶变换（FFT2） 4.G=F*H（注意是乘法不是点乘） 5.对G做IFFT2 6.对5中的结果进行剪切得到原大小的图片。CPU下numpy实现直接看代码，其中填充是手动填充的。#-*- codin
复制链接

扫一扫