文章目录
- 题目:A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications
- 时间:2017
- 会议:ISSCC(14.3)
- 研究机构:哈佛大学
- 链接:https://blog.csdn.net/tiaozhanzhe1900/article/details/83275688
- 参考链接1:https://blog.csdn.net/xbinworld/article/details/55118537
- 参考链接2:https://mp.weixin.qq.com/s?__biz=MzI3MDQ2MjA3OA==&mid=2247483726&idx=1&sn=46da35379241adb013498a67d15faab4&chksm=ead1fc5fdda67549b2b4d9dd09e1cd33061ece3143f57627203420d143b8ac0cd937e193c142&scene=21#wechat_redirect
1 英文缩写
- PVTA: process/voltage/temperature/aging
- FC: fully-connected
- DNN: deep-neural-network
- IPBUF: input buffer
- ReLU: rectified(矫正的) linear unit
- MxV: matrix-vector
- TC: two’s compliment
- SM: sign-magnitude
- RZFF: Razor flip-flops
2 overall architecture
This paper presents a 28nm SoC with a programmable FC-DNN accelerator design :
- elide(删去) unnecessary computation to exploit data sparsity(稀疏)
- using sign-magnitude number format for weights and datapath computation
- improved circuit-level timing violation tolerance in datapath logic via timeborrowing
- Razor timing violation detection to reduce energy
3 five-stage DNN accelerator
The DNN Engine is a 5-stage SIMD-style programmable sparse matrix-vector (MxV) machine for processing arbitrary(任意的) DNNs.
SBUF: double to allow simultaneous reads from the previous layer and writes to the current layer
4 zero operand:
之前的工作:通过clock-gates functional units to save power,但是在pipeline中会有bubble
他们:XBUF写回的时候就动态的消除zero operand
甚至跳过一些小的非零的数
5 error-tolerant operation
为了实现error-tolerant的操作,这个设计中在两个时序关键路径,W-MEM load and MAC unit的路径终点, 增加了Razor flip-flops (RZFFs)。双模式RZFF中的MUX可以选择支持 datapath FF功能或者带time borrowing的latch.
time borrowing:
将触发器F2改成锁存器L2,利用高电位L2是透明的性质,通往锁存器的路径可以从后续的路径借用时间,而不需要非要在时钟上升沿之前准备好数据,成为time borrowing
sign-magnitude: reduce switching activity in the MSBs and thus bit-flips