UG1270 Vivado HLS directives常用优化技术理解

OPTIMIZATION GOALS

  • 在schedule viewer下尽量减少空闲周期(bubble)。
  • the data processing rate of the design, Initiation Interval (most critical)
  • latency
  • area/resources

COMMON TRADE-OFF STRATEGY

  • 资源/空间换时间

COMMON INDICATORS

  • Initiation interval (II): the number of clock cycles before the function can accept new inputs, i.e., the interval between invocations of functions? For an ideal design, which process data at the rate of one sample per clock cycle, the most optimal one will consumes N+1 cycles.
  • Loop initiation interval: the number of the clock cycles between the starting point of the two iterations.
  • Loop iteration latency: the number of the clock cycles that it takes to complete one iteration.
  • Loop latency: complete all iterations.
  • Latency: the time for the function to complete all output.

COMMON TRICK

  • there cannot be sequential logic inside the pipeline, i.e., all levels beside the pipeline must be unrolled. (UG1270-P22) why requiring this(from the perspective of hardware mapping)? To execute at one sample rate per clock, the lower level logic should occur in parallel such that it will not consume more than one clock, hence it will be unrolled.
  • Once one loop is pipelined, the lower level logic has been unrolled, is there any significance to do pipeline at the lower level logic?
  • Only BRAM can support dual-port interface to supply two samples per clock.
  • Pipeline one loop means that one sample should be accessed within one clock, and one sample can be one data over even one dimension of multi-dimension array. Considering the bandwidth, the pipeline cannot be applied to any loop casually. Pipeline the loop that operates at the level of a sample typically.
  • For sample-based C code, compared with frame-based one, the function often contains a static variable whose value must be remembered (in the register) between invocations of the functions.
  • When dealing with sample-based C code, pipeline the function can guarantee one sample per clock.
  • unroll automatically vs manually
  • array partition can help set multiple BRAM parts, each of which has dual-port.
  • when synthesis done, the DRCs warning and errors can imply the following optimization direction. “It is important to review the output from the compilation log files and reports to understand what optimizations have been performed.”
  • Dataflow is similar to bypass technology. The data do not need to write back and can be passed into next function directly instead.
  • STREAM: to deal with streaming data, use FIFO whose depth should be altered according to the produce and consume rate. FIFO - streaming sequential data; ping-pong block RAM arbitrary/random access(What is ping-pong block RAM?)
  • The priority: pipeline > unroll. when pipelining a loop, the II cannot be satisfied, then unroll can be considered. II > latency, but when reducing the latency, II can be damaged.
  • For area, the BRAM is usually concerned.
  • Allocation and RESOURCE directives are used to limit the hardware resources (multiplier, etc.), even though unroll has been done manually. If there are multiple calls for the same function, ALLOCATION can help limit only one hardware module.
  • interface
  • array will only be mapped to BRAM? So when array is too large, FIFO is also one choice by using dataflow and stream.
  • array initialization:
  • Data reuse: small local storage, FIFO, to avoid reading repeatly/replicating additional access. How to use local cache? By assigning small local array or variable?
  • analysis of read/write data at each iteration.
  • assert
  • How to add conditional branch? adding if-statement?

Ref

  1. Vivado HLS学习资料有哪些
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值