博客:blog.shinelee.me | 博客园 | CSDN
写在前面
随便翻一翻流行的推理框架(加速器),如NCNN、NNPACK等,可以看到,对于卷积层,大家不约而同地采用了Winograd快速卷积算法,该算法出自CVPR 2016的一篇 paper:Fast Algorithms for Convolutional Neural Networks。
本文将尝试揭开Winograd算法的神秘面纱。
问题定义
将一维卷积运算定义为 F ( m , r ) F(m, r) F(m,r), m m m为Output Size, r r r为Filter Size,则输入信号的长度为 m + r − 1 m+r-1 m+r−1,卷积运算是对应位置相乘然后求和,输入信号每个位置至少要参与1次乘法,所以乘法数量最少与输入信号长度相同,记为
μ ( F ( m , r ) ) = m + r − 1 \mu(F(m, r))=m+r-1 μ(F(m,r))=m+r−1
在行列上分别进行一维卷积运算,可得到二维卷积,记为 F ( m × n , r × s ) F(m\times n, r\times s) F(m×n,r×s),输出为 m × n m\times n m×n,卷积核为 r × s r\times s r×s,则输入信号为 ( m + r − 1 ) ( n + s − 1 ) (m+r-1)(n+s-1) (m+r−1)(n+s−1),乘法数量至少为
μ ( F ( m × n , r × s ) ) = μ ( F ( m , r ) ) μ ( F ( n , s ) ) = ( m + r − 1 ) ( n + s − 1 ) \begin{aligned} \mu(F(m \times n, r \times s)) &=\mu(F(m, r)) \mu(F(n, s)) \\ &=(m+r-1)(n+s-1) \end{aligned} μ(F(m×n,r×s))=μ(F(m,r))μ(F(n,s))=(m+r−1)(n+s−1)
若是直接按滑动窗口方式计算卷积,一维时需要 m × r m\times r m×r次乘法,二维时需要 m × n × r × s m\times n \times r \times s m×n×r×s次乘法,远大于上面计算的最少乘法次数。
使用Winograd算法计算卷积快在哪里?一言以蔽之:快在减少了乘法的数量,将乘法数量减少至 m + r − 1 m+r-1 m+r−1或 ( m + r − 1 ) ( n + s − 1 ) (m+r-1)(n+s-1) (m+r−1)(n+s−1)。
怎么减少的?请看下面的例子。
一个例子 F(2, 3)
先以1维卷积为例,输入信号为 d = [ d 0 d 1 d 2 d 3 ] T d=\left[ \begin{array}{llll}{d_{0}} & {d_{1}} & {d_{2}} & {d_{3}}\end{array}\right]^{T} d=[d0d1d2d3]T,卷积核为 g = [ g 0 g 1 g 2 ] T g=\left[ \begin{array}{lll}{g_{0}} & {g_{1}} & {g_{2}}\end{array}\right]^{T} g=[g0g1g2]T,则卷积可写成如下矩阵乘法形式:
F ( 2 , 3 ) = [ d 0 d 1 d 2 d 1 d 2 d 3 ] [ g 0 g 1 g 2 ] = [ r 0 r 1 ] F(2, 3) = \left[ \begin{array}{lll}{d_{0}} & {d_{1}} & {d_{2}} \\ {d_{1}} & {d_{2}} & {d_{3}}\end{array}\right] \left[ \begin{array}{l}{g_{0}} \\ {g_{1}} \\ {g_{2}}\end{array}\right]=\left[ \begin{array}{c}{r_0} \\ {r_1}\end{array}\right] F(2,3)=[d0d1d1d2d2d3]⎣⎡g0g1g2⎦⎤=[r0r1]
如果是一般的矩阵乘法,则需要6次乘法和4次加法,如下:
r 0 = ( d 0 ⋅ g 0 ) + ( d 1 ⋅ g 1 ) + ( d 2 ⋅ g 2 ) r 1 = ( d 1 ⋅ g 0 ) + ( d 2 ⋅ g 1 ) + ( d 3 ⋅ g 2 ) \begin{array}{l}{r_{0}=\left(d_{0} \cdot g_{0}\right)+\left(d_{1} \cdot g_{1}\right)+\left(d_{2} \cdot g_{2}\right)} \\ {r_{1}=\left(d_{1} \cdot g_{0}\right)+\left(d_{2} \cdot g_{1}\right)+\left(d_{3} \cdot g_{2}\right)}\end{array} r0=(d0⋅g0)+(d1⋅g1)+(d2⋅g2)r1=(d1⋅g0