论文笔记（4.ICCAD.2017.COMBA）

最新推荐文章于 2024-09-13 18:24:33 发布

Jason_1141

最新推荐文章于 2024-09-13 18:24:33 发布

阅读量243

点赞数

文章标签： hls fpga

本文链接：https://blog.csdn.net/qq_40884849/article/details/112801385

版权

COMBA: A Comprehensive Model-Based Analysis Framework for High Level Synthesis of Real Applications

Information

Paper:COMBA: A Comprehensive Model-Based Analysis Framework for High Level Synthesis of Real Applications
Author: Jieru zhao
Key words:

Backgrounds

Previous work only support limited number of pragmas,which are not sufficient enough to real applications.

Work

Framework overview:
在这里插入图片描述

1.Recursive data collection(RDC)

RDC analyzes the LLVM IR to compute the required parameters.

Static information is obtained by analyzing the assembly instructions from the LLVM IR directly
Dynamic information depends on the code structure and optimization
pragmas applied, and is computed using the DFG.

2.Performance Model

包括loop unrolling, loop pipelining, array partitioning, function pipelining and dataflow五种pragma。

unroll中对perfect nest loop,Non-perfect nest loop,Multiple
loops三种嵌套循环结构latency的计算
从pipeline depth, initiation interval,trip count三个方面考虑latency
supports multi-dimension array partitioning with three options: block, cyclic and complete
calculate II of the function to measure the amount of function outputs per cycle（Vivado HLS unrolls all sub-loops completely and pipelines each sub-function inside a pipelined function.）
Dataflow doesn’t require sub-functions to be pipelined and sub-loops to be unrolled, but this technique can only be applied to functions or loops at the top level.

3.Resource Model

Focus on DSP and BRAM
DSP:(operators)

LUT-based and small bandwidth operations，the number of operations equals the number of instances
DSP-based operators,一次迭代使用的操作数除以II

BRAM:(arrays)
在这里插入图片描述

For scalars, the channel is a register. For arrays, the channels are ping-pong buffers by default. BRAM has two copies,one is for the output buffer,the other for the input.(if dataflow is applied)

4.Metric-guided design space exploration

1.Redundancy Elimination：
2.Guided Search:

MGDSE gives the top optimization priority to the longest sub-element,
which is assumed to have the greatest influence.
check whether the DSP and BRAM usage exceed the available resources on FPGAs
evaluates which array partitioning type (block or cyclic) is
beneficial in dimension i