卷积神经网络中的Winograd快速卷积算法

卷积神经网络中的Winograd快速卷积算法减少了乘法次数,提高运算效率。通过变换矩阵,将1D F(2, 3)卷积转换为较少乘法和加法的操作,1D到2D的扩展进一步减少计算复杂度。在实际应用中,Winograd算法常用于小尺寸卷积核,适用于NCNN、FeatherCNN等推理框架。" 98973107,8694734,Python数据挖掘:预处理实战-数据可视化与空值填充,"['数据挖掘', 'Python', '信息可视化']
摘要由CSDN通过智能技术生成

博客:blog.shinelee.me | 博客园 | CSDN

写在前面

随便翻一翻流行的推理框架(加速器),如NCNNNNPACK等,可以看到,对于卷积层,大家不约而同地采用了Winograd快速卷积算法,该算法出自CVPR 2016的一篇 paper:Fast Algorithms for Convolutional Neural Networks

本文将尝试揭开Winograd算法的神秘面纱。

问题定义

将一维卷积运算定义为 F ( m , r ) F(m, r) F(m,r) m m m为Output Size, r r r为Filter Size,则输入信号的长度为 m + r − 1 m+r-1 m+r1,卷积运算是对应位置相乘然后求和,输入信号每个位置至少要参与1次乘法,所以乘法数量最少与输入信号长度相同,记为

μ ( F ( m , r ) ) = m + r − 1 \mu(F(m, r))=m+r-1 μ(F(m,r))=m+r1

在行列上分别进行一维卷积运算,可得到二维卷积,记为 F ( m × n , r × s ) F(m\times n, r\times s) F(m×n,r×s),输出为 m × n m\times n m×n,卷积核为 r × s r\times s r×s,则输入信号为 ( m + r − 1 ) ( n + s − 1 ) (m+r-1)(n+s-1) (m+r1)(n+s1),乘法数量至少为

μ ( F ( m × n , r × s ) ) = μ ( F ( m , r ) ) μ ( F ( n , s ) ) = ( m + r − 1 ) ( n + s − 1 ) \begin{aligned} \mu(F(m \times n, r \times s)) &=\mu(F(m, r)) \mu(F(n, s)) \\ &=(m+r-1)(n+s-1) \end{aligned} μ(F(m×n,r×s))=μ(F(m,r))μ(F(n,s))=(m+r1)(n+s1)

若是直接按滑动窗口方式计算卷积,一维时需要 m × r m\times r m×r次乘法,二维时需要 m × n × r × s m\times n \times r \times s m×n×r×s次乘法,远大于上面计算的最少乘法次数

使用Winograd算法计算卷积快在哪里?一言以蔽之:快在减少了乘法的数量,将乘法数量减少至 m + r − 1 m+r-1 m+r1 ( m + r − 1 ) ( n + s − 1 ) (m+r-1)(n+s-1) (m+r1)(n+s1)

怎么减少的?请看下面的例子。

一个例子 F(2, 3)

先以1维卷积为例,输入信号为 d = [ d 0 d 1 d 2 d 3 ] T d=\left[ \begin{array}{llll}{d_{0}} & {d_{1}} & {d_{2}} & {d_{3}}\end{array}\right]^{T} d=[d0d1d2d3]T,卷积核为 g = [ g 0 g 1 g 2 ] T g=\left[ \begin{array}{lll}{g_{0}} & {g_{1}} & {g_{2}}\end{array}\right]^{T} g=[g0g1g2]T,则卷积可写成如下矩阵乘法形式:

F ( 2 , 3 ) = [ d 0 d 1 d 2 d 1 d 2 d 3 ] [ g 0 g 1 g 2 ] = [ r 0 r 1 ] F(2, 3) = \left[ \begin{array}{lll}{d_{0}} & {d_{1}} & {d_{2}} \\ {d_{1}} & {d_{2}} & {d_{3}}\end{array}\right] \left[ \begin{array}{l}{g_{0}} \\ {g_{1}} \\ {g_{2}}\end{array}\right]=\left[ \begin{array}{c}{r_0} \\ {r_1}\end{array}\right] F(2,3)=[d0d1d1d2d2d3]g0g1g2=[r0r1]

如果是一般的矩阵乘法,则需要6次乘法和4次加法,如下:

r 0 = ( d 0 ⋅ g 0 ) + ( d 1 ⋅ g 1 ) + ( d 2 ⋅ g 2 ) r 1 = ( d 1 ⋅ g 0 ) + ( d 2 ⋅ g 1 ) + ( d 3 ⋅ g 2 ) \begin{array}{l}{r_{0}=\left(d_{0} \cdot g_{0}\right)+\left(d_{1} \cdot g_{1}\right)+\left(d_{2} \cdot g_{2}\right)} \\ {r_{1}=\left(d_{1} \cdot g_{0}\right)+\left(d_{2} \cdot g_{1}\right)+\left(d_{3} \cdot g_{2}\right)}\end{array} r0=(d0g0)+(d1g1)+(d2g2)r1=(d1g0

摘要:卷积神经网络(CNN)在图像识别、语音识别、自然语言处理等领域取得了重要进展,但其计算量巨大,限制了其在嵌入式设备等资源有限的场景的应用。Winograd算法是一种高效的卷积计算算法,已经被广泛应用于CPU和GPU的优化。本文在此基础上,研究了基于Winograd算法的CNN硬件加速方法。首先介绍了Winograd算法的原理和优势,然后提出了基于Winograd算法卷积神经网络硬件加速器的架构和实现方法,并对其进行了性能测试和分析。实验结果表明,基于Winograd算法的CNN硬件加速器相比于传统的卷积计算方法,在计算速度和功耗上都有显著的提升,能够更好地满足嵌入式设备等资源有限场景下的应用需求。 关键词:卷积神经网络Winograd算法;硬件加速;嵌入式设备 Abstract: Convolutional neural networks (CNNs) have made significant progress in fields such as image recognition, speech recognition, and natural language processing, but their huge computational complexity limits their application in resource-limited scenarios such as embedded devices. The Winograd algorithm is an efficient convolutional calculation algorithm that has been widely used in CPU and GPU optimization. Based on this, this paper studies the hardware acceleration method of CNN based on Winograd algorithm. First, the principle and advantages of the Winograd algorithm are introduced. Then, the architecture and implementation method of the CNN hardware accelerator based on the Winograd algorithm are proposed, and its performance is tested and analyzed. The experimental results show that the CNN hardware accelerator based on the Winograd algorithm has significant improvements in calculation speed and power consumption compared with traditional convolutional calculation methods, which can better meet the application requirements in resource-limited scenarios such as embedded devices. Keywords: Convolutional neural network; Winograd algorithm; Hardware acceleration; Embedded devices.
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值