Eyeriss v1v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

本文链接：https://blog.csdn.net/m0_62789066/article/details/120732569

Eyeriss v2是MIT提出的针对移动设备的深度神经网络加速器，旨在应对不断变化的层形状和大小。相比Eyeriss v1，v2在65nm CMOS工艺下推理速度提高了12.6倍，能效提升了2.5倍。主要改进包括采用分层网格结构和增强数据复用策略，特别是对于小型网络如MobileNet，提高了权重、输入和psum的复用效率。文章还探讨了数据复用、带宽需求以及不同传输模式在处理卷积和全连接层时的角色。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先贴一个比较完整各个系列论文的汇总刊物：Architecture design for highly flexible and energy-efficient deep neural network accelerators

Architecture design for highly flexible and energy-efficient deep neural network acceleratorsdspace.mit.edu

eyerissv1, eyerissv2 是MIT提出的神经网络芯片加速方案，我们看看eyerissv2提出时的背景：

For compact design of DNN , the filter has been decomposed like below:

the current acceleration method do not efficiently match this trend .

小型网络更加的紧凑，大型卷积被分拆为多个子卷积核。

Eyeriss v2 : To deal with the widely varying layer shapes and sizes

Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet.

eyerissv2 比 v1 推理快 12.6倍，能耗小2.5倍。

在65nm的CMOS上的加速器，每秒推理mobilenet 1470次。

Challenges For Compact DNN

data resuse

在芯片推理中，为了避免数据搬运造成的耗时，数据复用是最大的挑战，复用weights, inputs ,还有psum 【psum指卷积中累乘后的累加】，上图说明了，各种网络在芯片推理中，不同层的不同数据复用率，mobilenet这样的紧凑小网络，复用率发生了变化，weights复用率基本没变，psum和input的复用率变少。【因为：模型小了，权重少了】

为了解决这个问题，在PE的组织和PE的利用率做出相应的改变。

对比来看结构变化如下：

Eyeriss v1 整体及PE结构