FFT为什么可以加速卷积运算

I'll assume this is being done on a conventional CPU, one core, executing one simple thread, no fancy hardware. If there is more than that going on, it can probably be accounted for with adjustments to the reasoning for a simpler system. Not much more can be said without either a specific system to discuss, or a whole textbook or research paper to cover a range of possibilities.

I wouldn't worry about power-of-two sizes. It doesn't matter. FFT algorithms with the butterfly units and all that exist for factors of 3, or any small number, not just 2. There are clever algorithms for prime-sized data series, too. I don't like quoting Wikipedia on this due to its impermanent nature, but anyway:

there are FFTs with O(N log N) complexity for all N, even for prime N

Implementations of FFTs for arbitrary N can be found in the GPL'd library FFTW.

The only trustworthy way in terms of serious engineering is to build and measure, but we certainly can get an idea from theory, to see relationships between variables. We need estimates of how many arithmetic operations are involved for each method.

Multiplying is still slower than addition on most CPUs, even if the difference has shrunk tremendously over the years, so let's just count multiplications. Accounting also for addition takes a bit more thinking and keeping track of stuff.

A straightforward convolution, actually multiplying and adding using the convolution kernel, repeating for each output pixel, needs W²·K² multiplications, where W is the number of pixels along one side of the image (assuming square for simplicity), and K is the size of the convolution kernel, as pixels along one side. It takes K² multiplications to compute one output pixel using the kernel and same-size portion of the input image. Repeat for all output pixels, which number the same as in the input image.

(Nmults)direct = W²·K²

To do the job in Fourier space, we must Fourier transform the image. This is done by applying an FFT to each column separately, and then to each row. The FFT for N data points takes about 2N·log(N) multiplications; we want N to be W, the length of one column or row. All logarithms here are base two.

There are W rows and W columns, so after all the FFTs are done, we have done 2W·(2W·log(W)) multiplications. Double that, because after we multiply by the Fourier transform of the kernel, we have to inverse-Fourier the data to get back to sensible image. That's 8W²·log(W). Of course, multiplying by the Fourier transform of the kernel has to be done, another W² multiplications. (Done once, not once per output pixel, per row or anything.) These are complex multiplications, so that's 4W² real multiplications.

So, unless I goofed up (and I probably did) we have

(Nmults)Fourier = 4W²·(2·log(W) + 1)

When do we want to do things the direct way? When K is sufficiently small to make W²·K² smaller than 4W²·(2·log(W) + 1). A common factor of W² is easily factored out. We can probably drop the "+1" since we're dealing with idealized estimates. The +1 is likely lost in errors relative to actual implementations, from not counting additions, loop overheads and so on. That leaves:

K² < 8·log(W)

This is the approximate condition for choosing a direct approach over a frequency space approach.

Note that correlation of two same-size images is just like convolving with a kernel of size K = W. Fourier space is always the way to do it.

This can be refined and argued over to account for overhead, pipelining of opcodes, float vs. fixed-point, and thrown out the window with GPGPU and specialized hardware.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值