文献阅读（14）Angel-Eye

最新推荐文章于 2024-07-25 23:06:58 发布

tiaozhanzhe1900

最新推荐文章于 2024-07-25 23:06:58 发布

阅读量176

点赞数 1

分类专栏： NPU 文章标签：深度学习硬件架构

本文链接：https://blog.csdn.net/tiaozhanzhe1900/article/details/101176777

版权

NPU 专栏收录该内容

74 篇文章 17 订阅

订阅专栏

文章目录

1 缩写
2 abstract & introduction & motivation
3 flow description

题目：Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
时间：2018
期刊：IEEE transactions on computer-aided design of integrated circuits and systems
研究机构：清华/韩松

1 缩写

2 abstract & introduction & motivation

we propose Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool
It is common to use a single layer implementation with static loop unroll strategy, which is the same as this paper

3 flow description

3.1 data quantization

fine tuning to further improve the accuracy
greedy strategy by optimizing the radix position layer by layer

3.2 Hardware Architecture

PE Array：一个PE有多个convolution engine

kernel level parallelism：一个convolution engine对应一个卷积核
input channel parallelism：不同的convolution engine对于不同的input channel
output channel parallelism：不同的PEshare相同的input，但卷积核不同，计算不同的输出channel

3.3 compiler

调度原则： 为了充分利用数据局部性，减少IO

input channel first：同一个input feature计算出尽量多的输出
output channel second：对同一个input bolck算出所有的输出block，因为对于同一个input block来说，不同kernel只是对应了不同的输出channel
no intermediate result out：output buffer满了之后，load新的input feature，这样输出可以累加了
back and forth

编译流程

block partition
memory mapping
dependency check

3.4 run-time work flow

训练好的模型参数，记录在二进制文件parameter.bin中在初始化阶段拷贝到memory中，然后host CPU先执行，要执行CNN命令时，再把图片拷贝到对应的空间，FPGA工作，CPU可以再进行非CNN的工作
在这里插入图片描述

tiaozhanzhe1900

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
文献阅读（14）Angel-Eye

文章目录1 缩写2 abstract & introduction & motivation3 flow description3.1 data quantization3.2 Hardware Architecture3.2.1 PE Array题目：Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded...
复制链接

扫一扫

专栏目录