论文笔记(1.ICCAD.2018)

HLS-Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds

Information

  • Paper: HLS-Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds
  • Author: Jason Cong
  • Key words:Design space exploration(DSE) ,Variable loop bounds, Loop-carried dependency

Work

1.perform source to source HLS code transformation to increase the utilization of the compute resources for variable loops
2.describe a cycle and resource evaluating model to rapidly perform DSE with high accuracy

“Our work is more focused on optimizing these innermost loops by exploiting fine-grain parallelism and pipelining, accurately estimating resource sharing among these serial loops, and efficiently allocating non-sharable resource for overall latency minimization.”

Methods

For work 1,they mainly deal with examples with variable loop bounds like:completely parallel, reduction, and prefix sum
1.variable bounds:
Why:HLS tools cannot unroll the loops with variable bounds, a common optimization strategy is to pipeline the loop,but pipelining cannot exploit the data-level parallelism that exists in the loop.Unrolling the loop based on the maximum loop bound will lead to a severe PE efficiency problem.
Methods:
Partial Unrolling with Pipelining:
在这里插入图片描述
After transformation:
在这里插入图片描述
Result:
在这里插入图片描述
2.Variable Reduction:
Why: “#pragma HLS unroll factor=xxx.” is inefficient for floating-point variable loop reduction.If the loop bound is much smaller than the maximum, many adders will be left idle.Inserting pipelining directive is also not very efficient for floating-point reduction because of a true loop-carried dependency, and the result of the previous iteration cannot be immediately produced because of the long latency of the floating-point operations .
Methods:
Early termination
The reduction tree in stage 2 is pipelined across each level.
在这里插入图片描述
After transformation:
在这里插入图片描述
result:
在这里插入图片描述
3.variable Prefix Sum:
why:the true dependency between psum[k] and psum[k-1] prohibits II becoming 1 when the loop is pipelined and psum is a floating-point variable.Applying an unrolling directive results in a serialized addition due to the dependency and does not bring any speedup.
Method:
Kogge–Stone algorithm

For work 2: CYCLE / RESOURCE ESTIMATION
Separate sharable and non-sharable resource of a loop

OVERALL FLOW:

在这里插入图片描述

EXPERIMENTAL RESULT:

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值