HLS第十七课(pragma, array, data,)

对函数的任何编译控制,都体现在pragma中。
下面对一些常用的pragma进行详细说明。

+++++++++++++++++++++++++++++++++++++++
pragma HLS array_map

Each array is mapped into a block RAM 。
The basic block RAM unit provided in an FPGA is 18K.
If many small arrays do not use the full 18K, a better use of the block RAM resources is to map many small arrays into a single larger array.

int8 array1[M];
int12 array2[N];
#pragma HLS ARRAY_MAP variable=array1 instance=array3 horizontal
#pragma HLS ARRAY_MAP variable=array2 instance=array3 horizontal

Horizontal mapping is the default mode, and concatenates the arrays to form a new
array with more elements.

+++++++++++++++++++++++++++++++++++++++++++++++
pragma HLS array_partition

随机读写的数组,被实现为BRAM。只有两个读写口,
所以,一个clock,只能读写最多两个元素的数据,
例如在C语句中,出现了
m[i] = m[k]+m[j];
就需要至少3个读写口。
如果要在一个clock中,读写更多个元素的数据,就需要扩展出更多的读写口。

Results in RTL with multiple small memories or multiple registers instead of one large memory。
Effectively increases the amount of read and write ports for the storage.
Potentially improves the throughput of the design.

Complete partitioning decomposes the array into individual elements. For a
one-dimensional array, this corresponds to resolving a memory into individual registers.
This is the default type.
相当于factor为SIZE。

#pragma HLS array_partition variable=AB block factor=4

This example partitions the 13 element array, AB[13], into four arrays using block partitioning。
block是均分模式,均分后,每组含有的元素分别是3,33,4。

#pragma HLS array_partition variable=AB block factor=2 dim=2

This example partitions dimension two of the two-dimensional array, AB[6][4] into two new
arrays of dimension [6][2]。
在HLS中,dim的序号编号,是从左到右的,注意,这和常规的理解是不同的。

int in_local[MAX_SIZE][MAX_DIM];
#pragma HLS ARRAY_PARTITION variable=in_local complete dim=2

This example partitions the second dimension of the two-dimensional in_local array into
individual elements.

++++++++++++++++++++++++++++++++++++++++
pragma HLS array_reshape

This reduces the number of block RAM consumed while
providing the primary benefit of partitioning: parallel access to the data.
This pragma creates a new array with fewer elements but with greater bit-width, allowing more data to be accessed in a single clock cycle.
A factor of 2 splits the array in half, while doubling the bit-width.
A factor of 3 divides the array into three, with triple the bit-width

int array1[N];
int array2[N];
int array3[N];
#pragma HLS ARRAY_RESHAPE variable=array1 block factor=2 dim=1
#pragma HLS ARRAY_RESHAPE variable=array2 cyclic factor=2 dim=1
#pragma HLS ARRAY_RESHAPE variable=array3 complete dim=1

第一个,array1被按均分模式分割,(不常用)
第二个,array2被按交替模式分割,(常用)
第三个,array3被按完全模式分割,(factor = N)

对于一维数组,完全分割之后,就是元素为1的RAM,这其实就是register。

#pragma HLS array_reshape variable=AB block factor=4

Reshapes (partition and maps) an 8-bit array with 17 elements, AB[17], into a new 32-bit array
with five elements using block mapping。

#pragma HLS array_reshape variable=AB block factor=2 dim=2`

Reshapes the two-dimensional array AB[6][4] into a new array of dimension [6][2], in which
dimension 2 has twice the bit-width:

#pragma HLS array_reshape variable=AB complete dim=0

Reshapes the three-dimensional 8-bit array, AB[4][2][2] in function foo, into a new single
element array (a register), 128 bits wide (422*8):
dim=0 means to reshape all dimensions of the array.

+++++++++++++++++++++++++++++++++++++++++++++
pragma HLS data_pack

Packs the data fields of a struct into a single scalar with a wider word width.
The DATA_PACK pragma is used for packing all the elements of a struct into a single wide vector to reduce the memory required for the variable, while allowing all members of the struct to be read and written to simultaneously.
The first field takes the LSB of the vector, and the final element of the struct is aligned with the MSB of the vector.
Any arrays declared inside the struct are completely partitioned and reshaped into a wide scalar and packed with other scalar fields.

typedef struct{
unsigned char R, G, B;
} pixel;
pixel AB;
#pragma HLS data_pack variable=AB

Packs struct pointer AB with three 8-bit fields (typedef struct {unsigned char R, G, B;} pixel) in
function foo, into a new 24-bit pointer.

typedef struct{
unsigned char R, G, B;
} pixel;
pixel AB[17];
#pragma HLS data_pack variable=AB

Packs struct array AB[17] with three 8-bit field fields (R, G, B) into a new 17 element array of
24 bits.

void rgb_to_hsv(
	RGBcolor* in, // Access global memory as RGBcolor structwise
	HSVcolor* out, // Access Global Memory as HSVcolor structwise
	int size
)
{
#pragma HLS data_pack variable=in struct_level
#pragma HLS data_pack variable=out struct_level
...
}

In this example the DATA_PACK pragma is specified for in and out arguments to rgb_to_hsv
function to instruct the compiler to do pack the structure on an 8-bit boundary to improve the
memory access

++++++++++++++++++++++++++++++++++++++++
pragma HLS dataflow

All operations are performed sequentially in a C description.
The DATAFLOW optimization enables the operations in a function or loop to start operation before
the previous function or loop completes all its operations.

Vivado HLS analyzes the dataflow between sequential functions or loops and create channels (based on pingpong RAMs or FIFOs)

the data must flow through the design from one task to the next.

wr_loop_j: for (int j = 0; j < TILE_PER_ROW; ++j) {
#pragma HLS DATAFLOW
	wr_buf_loop_m: for (int m = 0; m < TILE_HEIGHT; ++m) {
		wr_buf_loop_n: for (int n = 0; n < TILE_WIDTH; ++n) {
		#pragma HLS PIPELINE
		// should burst TILE_WIDTH in WORD beat
		outFifo >> tile[m][n];
		}
	}
	
	wr_loop_m: for (int m = 0; m < TILE_HEIGHT; ++m) {
		wr_loop_n: for (int n = 0; n < TILE_WIDTH; ++n) {
		#pragma HLS PIPELINE
			outx[TILE_HEIGHT*TILE_PER_ROW*TILE_WIDTH*i
					+TILE_PER_ROW*TILE_WIDTH*m+TILE_WIDTH*j+n] = tile[m][n];
		}
}

Specifies DATAFLOW optimization within the loop wr_loop_j.
dataflow分析各个执行块之间的数据依赖,并尝试插入FIFO,以求改善。
这里FIFO是一个泛义概念,深度为1的FIFO,其实就是register。

++++++++++++++++++++++++++++++++++++++++++++++++++
pragma HLS dependence

Vivado HLS automatically detects dependencies:
Within loops (loop-independent dependence),
Between different iterations of a loop (loop-carry dependence)。
The DEPENDENCE
pragma allows you to explicitly specify the dependence and resolve a false dependence.

intra: dependence within the same loop iteration.
inter: dependence between different loop iterations.
RAW (Read-After-Write - true dependence) The write instruction uses a value used by the
read instruction.
WAR (Write-After-Read - anti dependence) The read instruction gets a value that is
overwritten by the write instruction.
WAW (Write-After-Write - output dependence) Two write instructions write to the same
location, in a certain order.


for (row = 0; row < rows + 1; row++) {
	for (col = 0; col < cols + 1; col++) {
	#pragma HLS PIPELINE II=1
	#pragma HLS dependence variable=buff_A inter false
	#pragma HLS dependence variable=buff_B inter false
	
		if (col < cols) {
			buff_A[2][col] = buff_A[1][col]; // read from buff_A[1][col]
			buff_A[1][col] = buff_A[0][col]; // write to buff_A[1][col]
			buff_B[1][col] = buff_B[0][col];
			temp = buff_A[0][col];
		}
	}
}

Vivado HLS does not have any knowledge about the value of cols and
conservatively assumes that there is always a dependence between the write to buff_A[1][col] and the read from buff_A[1][col].
use the DEPENDENCE pragma to state that there is no dependence
between loop iterations (in this case, for both buff_A and buff_B).

++++++++++++++++++++++++++++++++++++++++++++++++++++++
pragma HLS function_instantiate

By default:
Functions remain as separate hierarchy blocks in the RTL.All instances of a function, at the same level of hierarchy, make use of a single RTL implementation (block)
The FUNCTION_INSTANTIATE pragma is used to create a unique RTL implementation for each
instance of a function, allowing each instance to be locally optimized according to the function
call.

char foo_sub(char inval, char incr) {
#pragma HLS function_instantiate variable=incr
	return inval + incr;
}

void foo(char inval1, char inval2, char inval3,
char *outval1, char *outval2, char * outval3)
{
	*outval1 = foo_sub(inval1, 1);
	*outval2 = foo_sub(inval2, 2);
	*outval3 = foo_sub(inval3, 3);
}

variable: A required argument that defines the function argument to use as a constant.

Without the FUNCTION_INSTANTIATE pragma, the following code results in a single RTL
implementation of function foo_sub for all three instances of the function in foo.

In the code sample above, the FUNCTION_INSTANTIATE pragma results in three different
implementations of function foo_sub, each independently optimized for the incr argument,
After FUNCTION_INSTANTIATE optimization, foo_sub is effectively be transformed into three
separate functions, each optimized for the specified values of incr.

+++++++++++++++++++++++++++++++++++++++++++++
pragma HLS inline

After inlining, the function is dissolved
into the calling function and no longer appears as a separate level of hierarchy in the RTL.

#pragma HLS inline 
#pragma HLS inline off
#pragma HLS inline region 
#pragma HLS inline recursive
#pragma HLS inline region recursive
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值