读HLS Bluebook(一)

何种C++代码风格更能带来好的综合结果?C++代码与硬件之间的对应关系是怎样的?从C++代码出发我们可以期望什么样的硬件综合结果?

what was initially a straightforward process from specification to implementation becomes a nightmarish iterative cycle. The hand-coded RTL design is tested, bugs are reported, and time is spent trying to hunt them down and fix them individually - only to move on to the next bug. This could be an endless process if it didn’t have to end at some point to meet deadlines

现今RTL设计工作中,验证是一个瓶颈,许多公司会招聘大量验证岗工程师;而HLS可以通过高级语言的sim来替代这一过程。除此之外,代码量的减少不仅意味着工作量的减少,同时也更加有利于检错纠错范围的缩小。

Each time a code change is made the testbench should be rerun to check the change against the original design. Failure to do this may mean hours of debugging to figure out which change broke the design.

上述不包含在有大段代码乃至整个算法结构都需要改变的情况。

位选操作在Mentor Graphics的Algorithm C中是支持的,在ap_int也支持吗?位片选slice select呢?

Unroll and Pipeline

即使没有明显的循环,在顶层函数多次调用时也会存在隐式的循环,即main loop,与Verilog种的module类似。举例如下所示:

#include “accum.h” 
void accumulate(int a, int b, int c, int d, int &dout){
    
	int t1,t2;
	t1 = a + b; 
	t2 = t1 + c; 
	dout = t2 + d;
}

Main_Loop
默认分配一个加法器操作一个cycle?加法器内部纯组合逻辑电路,限制了关键路径。

If a loop is left “rolled”, each iteration of the loop takes at least one clock cycle to execute in hardware. This is because there is an implied “wait until clock” for the loop body.

和For循环体中的先判断循环变量再退出相关吗?即使是最后一次也是先改变循环变量,再退出。
Unroll_2

  1. Having a variable as the loop upper or lower bound often results in the loop counter hardware being larger than needed
  2. Having a variable as the loop upper bound requires one extra clock cycle to test the loop condition
  3. Having an unconstrained bit width on the loop exit condition results in control logic larger than needed

Loop With Conditional Bounds

#include “accum.h” 
#include <ac_int.h> 
void accumulate(int din[4], int &dout, unsigned int ctrl){
   
	int acc=0;
	ACCUM:for(int i=0;i<ctrl;i++){
   
		acc += din[i];
	}
	dout = acc; 
}

Loop with conditional bounds
因为HLS无法提前获悉ctrl的数值范围,因此保险起见,采用了33-bit的控制逻辑,即使ctrl被声明为32位;当ctrl取值范围为0~4时,即至多只有四个数据需要求和时,仍使用了32-bit计数器而非3-bit计数器。为了能减小数据位宽,需要将循环变量上限固定,可以加入带条件的break,因此可以采用下述代码形式(假设循环体至少需要被执行一次):

#include “accum.h” 
#include <ac_int.h> 
void accumulate(int din[4], int &dout, int ctrl){
    
	int acc=0;
	ACCUM:for(int i=0;i<4;i
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值