文章目录
- 题目:Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA
- 时间:2018
- 期刊:IEEE transactions on computer-aided design of integrated circuits and systems
- 研究机构:清华/韩松
1 缩写
2 abstract & introduction & motivation
we propose Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool
It is common to use a single layer implementation with static loop unroll strategy, which is the same as this paper
3 flow description
3.1 data quantization
- fine tuning to further improve the accuracy
- greedy strategy by optimizing the radix position layer by layer
3.2 Hardware Architecture
PE Array:一个PE有多个convolution engine
- kernel level parallelism:一个convolution engine对应一个卷积核
- input channel parallelism:不同的convolution engine对于不同的input channel
- output channel parallelism:不同的PEshare相同的input,但卷积核不同,计算不同的输出channel
3.3 compiler
调度原则: 为了充分利用数据局部性,减少IO
- input channel first:同一个input feature计算出尽量多的输出
- output channel second:对同一个input bolck算出所有的输出block,因为对于同一个input block来说,不同kernel只是对应了不同的输出channel
- no intermediate result out:output buffer满了之后,load新的input feature,这样输出可以累加了
- back and forth
编译流程
- block partition
- memory mapping
- dependency check
3.4 run-time work flow
训练好的模型参数,记录在二进制文件parameter.bin中在初始化阶段拷贝到memory中,然后host CPU先执行,要执行CNN命令时,再把图片拷贝到对应的空间,FPGA工作,CPU可以再进行非CNN的工作