UG1270 Vivado HLS directives常用优化技术理解

最新推荐文章于 2024-04-29 20:18:31 发布

高纳德

最新推荐文章于 2024-04-29 20:18:31 发布

阅读量285

点赞数

分类专栏：笔记

本文链接：https://blog.csdn.net/konghaocc/article/details/116668163

版权

笔记专栏收录该内容

16 篇文章 0 订阅

订阅专栏

OPTIMIZATION GOALS

在schedule viewer下尽量减少空闲周期(bubble)。
the data processing rate of the design, Initiation Interval (most critical)
latency
area/resources

COMMON TRADE-OFF STRATEGY

资源/空间换时间

COMMON INDICATORS

Initiation interval (II): the number of clock cycles before the function can accept new inputs, i.e., the interval between invocations of functions? For an ideal design, which process data at the rate of one sample per clock cycle, the most optimal one will consumes N+1 cycles.
Loop initiation interval: the number of the clock cycles between the starting point of the two iterations.
Loop iteration latency: the number of the clock cycles that it takes to complete one iteration.
Loop latency: complete all iterations.
Latency: the time for the function to complete all output.

COMMON TRICK

there cannot be sequential logic inside the pipeline, i.e., all levels beside the pipeline must be unrolled. （UG1270-P22） why requiring this(from the perspective of hardware mapping)? To execute at one sample rate per clock, the lower level logic should occur in parallel such that it will not consume more than one clock, hence it will be unrolled.
Once one loop is pipelined, the lower level logic has been unrolled, is there any significance to do pipeline at the lower level logic?
Only BRAM can support dual-port interface to supply two samples per clock.
Pipeline one loop means that one sample should be accessed within one clock, and one sample can be one data over even one dimension of multi-dimension array. Considering the bandwidth, the pipeline cannot be applied to any loop casually. Pipeline the loop that operates at the level of a sample typically.
For sample-based C code, compared with frame-based one, the function often contains a static variable whose value must be remembered (in the register) between invocations of the functions.
When dealing with sample-based C code, pipeline the function can guarantee one sample per clock.
unroll automatically vs manually
array partition can help set multiple BRAM parts, each of which has dual-port.
when synthesis done, the DRCs warning and errors can imply the following optimization direction. “It is important to review the output from the compilation log files and reports to understand what optimizations have been performed.”
Dataflow is similar to bypass technology. The data do not need to write back and can be passed into next function directly instead.
STREAM: to deal with streaming data, use FIFO whose depth should be altered according to the produce and consume rate. FIFO - streaming sequential data; ping-pong block RAM arbitrary/random access(What is ping-pong block RAM?)
The priority: pipeline > unroll. when pipelining a loop, the II cannot be satisfied, then unroll can be considered. II > latency, but when reducing the latency, II can be damaged.
For area, the BRAM is usually concerned.
Allocation and RESOURCE directives are used to limit the hardware resources (multiplier, etc.), even though unroll has been done manually. If there are multiple calls for the same function, ALLOCATION can help limit only one hardware module.
interface
array will only be mapped to BRAM? So when array is too large, FIFO is also one choice by using dataflow and stream.
array initialization:
Data reuse: small local storage, FIFO, to avoid reading repeatly/replicating additional access. How to use local cache? By assigning small local array or variable?
analysis of read/write data at each iteration.
assert
How to add conditional branch? adding if-statement?

Ref

Vivado HLS学习资料有哪些

高纳德

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
UG1270 Vivado HLS directives常用优化技术理解

OPTIMIZATION GOALS在schedule viewer下尽量减少空闲周期(bubble)。the data processing rate of the design, Initiation Interval (most critical)latencyarea/resourcesCOMMON TRADE-OFF STRATEGY资源/空间换时间COMMON INDICATORSInitiation interval (II): the number of cloc
复制链接

扫一扫