论文笔记(4.ICCAD.2017.COMBA)

COMBA: A Comprehensive Model-Based Analysis Framework for High Level Synthesis of Real Applications

Information

  • Paper:COMBA: A Comprehensive Model-Based Analysis Framework for High Level Synthesis of Real Applications
  • Author: Jieru zhao
  • Key words:

Backgrounds

Previous work only support limited number of pragmas,which are not sufficient enough to real applications.

Work

Framework overview:
在这里插入图片描述

1.Recursive data collection(RDC)

RDC analyzes the LLVM IR to compute the required parameters.

  • Static information is obtained by analyzing the assembly instructions from the LLVM IR directly
  • Dynamic information depends on the code structure and optimization
    pragmas applied, and is computed using the DFG.
2.Performance Model

包括loop unrolling, loop pipelining, array partitioning, function pipelining and dataflow五种pragma。

  • unroll中对perfect nest loop,Non-perfect nest loop,Multiple
    loops三种嵌套循环结构latency的计算
  • 从pipeline depth, initiation interval,trip count三个方面考虑latency
  • supports multi-dimension array partitioning with three options: block, cyclic and complete
  • calculate II of the function to measure the amount of function outputs per cycle(Vivado HLS unrolls all sub-loops completely and pipelines each sub-function inside a pipelined function.)
  • Dataflow doesn’t require sub-functions to be pipelined and sub-loops to be unrolled, but this technique can only be applied to functions or loops at the top level.
3.Resource Model

Focus on DSP and BRAM
DSP:(operators)

  • LUT-based and small bandwidth operations,the number of operations equals the number of instances
  • DSP-based operators,一次迭代使用的操作数除以II

BRAM:(arrays)
在这里插入图片描述

For scalars, the channel is a register. For arrays, the channels are ping-pong buffers by default. BRAM has two copies,one is for the output buffer,the other for the input.(if dataflow is applied)

4.Metric-guided design space exploration

1.Redundancy Elimination:
2.Guided Search:

  • MGDSE gives the top optimization priority to the longest sub-element,
    which is assumed to have the greatest influence.
  • check whether the DSP and BRAM usage exceed the available resources on FPGAs
  • evaluates which array partitioning type (block or cyclic) is
    beneficial in dimension i
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值