HLS第三十四课（UG871，HLS IP接入SOC系统）

最新推荐文章于 2023-04-23 21:00:08 发布

Huskar_Liu

最新推荐文章于 2023-04-23 21:00:08 发布

阅读量1.3k

点赞数

分类专栏： hls 文章标签： hls

本文链接：https://blog.csdn.net/weixin_42418557/article/details/121123243

版权

hls 专栏收录该内容

42 篇文章 102 订阅

订阅专栏

能够和其他模块通过AXIS互联的接口，
需要使用接口约束，axis。

能够和DMA通过AXI4互联的接口，
需要使用接口约束，m_axi。

能够接入SOC系统的，
需要使用接口约束，s_axilite。

下面先来看看s_axilite接口约束。
+++++++++++++++++++++++++++++++++++++++++++++
通过s_axilite接口约束的port，可以和ARM互联，通常用于配置通信，或者返回值通信。
通过使用bundle选项，可以把多个port映射到同一个axilite总线的register中。对于没有指定bundle的端口，则会默认归于一个名为default_axilite的总线中。
通过使用offset选项，可以手工指定某个port所映射的register的偏移地址。

void hls_macc(int a, int b, int *accum, bool accum_clr)
{
#pragma HLS INTERFACE s_axilite port=return bundle=HLS_MACC

#pragma HLS INTERFACE s_axilite port=a bundle=HLS_MACC
#pragma HLS INTERFACE s_axilite port=b bundle=HLS_MACC
#pragma HLS INTERFACE s_axilite port=accum_clr bundle=HLS_MACC

#pragma HLS INTERFACE s_axilite port=accum bundle=HLS_MACC



   static acc_reg = 0;
   
   if (accum_clr)
      acc_reg = 0;
      
   acc_reg += a * b;
   *accum = acc_reg;
}

在本例中，
return被指定了AXILITE总线接口，所以，当模块FSM运行结束时，更新return，return的值的变化，最终通过axilite总线中的register读取。
注意，如果return指定为了AXILITE总线接口，那么，模块一定会输出interrupt信号，通常将interrupt和ap_done或者ap_ready信号相关联。

a和b被指定了AXILITE总线接口，所以，当模块FSM运行开始时，会读取a和b对应的寄存器的值，
accum_clr 被指定了AXILITE总线接口，所以，当模块FSM运行开始时，会读取accum_clr对应的寄存器的值，

需要注意的是，accum，这是一个指针，也被指定了AXILITE总线接口，所以，它首先被HLS理解为一个数组，然后在代码分析时，HLS发现accum并没有任何的偏移寻址访问的操作，所以，HLS认为这个数组是一个单元素数组，将数组优化为一个单一变量。
在HLS中，数组被指定为AXILITE总线接口时，数组中的每个元素，都拥有一个base addr，并被分配了一段连续地址，成为AXILITE总线中的一个或者多个寄存器。所以，当C描述代码中出现对数组的元素的读写访问时，最终是通过读写对应地址的register来完成的。
另外，HLS更倾向于理解数组，而不是指针。但是，利用指针完成的操作也是合法的。HLS会对指针操作进行转化，例如，

*accum = acc_reg;
//accum[0] = acc_reg;

++++++++++++++++++++++++++++++++++++++++++
AXILITE总线与interrupt
IP blocks created with the Vivado HLS tool can generate two kinds of interrupts:
1 Task completion (ap_done)
2 Task pipelining
In addition, the APIs have been designed with support for up to 32 interrupt sources per IP block.
Issuing an interrupt to the processor is controlled by two functions:

XExample_InterruptGlobalEnable()
XExample_InterruptGlobalDisable()

there are APIs to control the behavior of individual interrupt sources within the IP block.
Interrupt sources are 1-hot encoded onto a 32-bit control word inside the generated IP block.
Bit 0 is the least significant bit of the word and encodes the status of the done signal interrupt.
Enabling and disabling interrupt sources in the IP block is carried out by:

XExample_InterruptEnable(…)
XExample_InterruptDisable(…)

In cases where the IP block has more than one interrupt source, the processor must determine the cause of the interrupt. The cause can be determined by using these functions:

XExample_InterruptGetEnabled()
XExample_InterruptGetStatus()

multiple internal sources can be active at the same time. Interrupt priority and handling is determined by the ISR.
Once the processor has determined the interrupt source, it must clear the source in the IP block. Clearing an interrupt is achieved using the clear function:

XExample_InterruptClear(…)

CPU侧，初始化并注册ISR，和常规流程没有什么区别。

void SetupInterrupt(){
	int result;
	// Find the interrupt configuration table
	XScuGic_Config *pCfg =
							XScuGic_LookupConfig(XPAR_SCUGIC_SINGLE_DEVICE_ID);
	// Initialize the Interrupt Controller
	Result = XScuGic_CfgInitialize(&ScuGic, pCfg, pCfg->CpuBaseAddress);
	
	//Initialize the exception handler
	Xil_ExceptionInit();
	//Register the exception handler
	Xil_ExceptionRegisterHandler(XIL_EXCEPTION_ID_INT,
										(Xil_ExceptionHandler) XScuGic_InterruptHandler,&ScuGic);
	//Enable the exception handler
	Xil_ExceptionEnable();
	
	//Connect the Example ISR to the exception table
	result = XScuGic_Connect( &ScuGic,
								XPAR_EXAMPLE_INTERRUPT_INTR,
								(Xil_InterruptHandler)ExampleISR,&ex);
	...
	//Enable the Example ISR
	XScuGic_Enable(&ScuGic,XPAR_EXAMPLE_INTERRUPT_INTR);
	
	return result;
}

首先是配置GIC，即查找GICCFG，然后利用GICCFG初始化GIC，
然后配置exception，即，初始化exception，注册exception的ISR，然后使能exception，
然后是配置interrupt，即，连接各种ISR到GIC，
然后是启动GIC，即，enable GIC。

The following code shows how to create an ISR for the example function:

void ExampleISR(void *InstancePtr){
	int enabled_list;
	int status_list;
	XExample *pEx = (XExample *) InstancePtr;
	
	//Disable Global Interrupt
	XExample_InterruptGlobalDisable(pEx);
	
	//Get list of enabled interrupts
	enabled_list = XExample_InterruptGetEnabled(pEx);
	//Get interrupt status list
	status_list = XExample_InterruptGetStatus(pEx);
	
	//Check ap_done created the interrupt
	if((enabled_list & 1) && (status_list & 1)){
		//Clear the ap_done interrupt
		XExample_InterruptClear(pEx,1);
		
		//Set a result status flag
		NewResult = 1;
	}
}

注意，传入的中断号，是自然编码的，而不是科学编码的，
所以，ap_done引起的中断，中断号是1，但是位于bit0。

这个代码中，使用了共享变量，使得ISR和main程序之间，可以通信，共享变量作为标志使用，
如果ISR中检测到ap_done，就会对newresult这个flag打标。

int main(){
	int result;
	//Initialize the IP
	XExample_Config ex_config = {
									0,
									XPAR_EXAMPLE_TOP_0_S_AXI_EXAMPLE_BASEADDR};
	XExample_Initialize(&ex,&ex_config);
	
	//Setup the Interrupt for the System
	SetupInterrupt();
	
	//Write the values for port A and B
	XExample_SetA(&ex,5);
	XExample_SetB(&ex, 10);
	XExample_SetB_Vld(&ex);
	
	//Start the IP
	ExampleStart(&ex);
	
	//Wait for the core interrupt
	while(1)
	{
		while(!NewResult);
	
		//Get the return value of the IP
		result = XExample_GetReturn(&ex);
	
		printf("IP result = %d\n\r",result);
	}
	
	return 0;
}

ISR运行在前台，由中断事件触发执行，
main运行在后台，用一个while(!NewResult)循环完成自旋等待，实现和ISR的同步。
当ISR对newresult打标后，跳出自旋，读取return的值。

++++++++++++++++++++++++++++++++++++++++++++++++++++
再来看看AXIS接口。
An AXI4-Stream is used without side-channels when the function argument does not contain any AXI4 side-channel elements.
AXIS的边带信号包括TUSER，TKEEP，TLAST等。
例如：

void example(int A[50], int B[50]) {
#pragma HLS INTERFACE axis port=A
#pragma HLS INTERFACE axis port=B

A和B被施加了AXIS接口约束，
portA和portB被实现时，信号集里面，只包含TDATA，TVALID，TREADY。

Multiple variables can be combined into the same AXI4-Stream interface by using a struct and the DATA_PACK directive.
the DATA_PACK directive may be used to pack the elements of a struct into a single wide-vector, allowing all elements of the struct to be implemented in the same AXI4-Stream interface.
如果需要使用边带信号，那么就要先用struct将各个数据成员封装起来，并配合使用data_pack。
如果接口约束为AXIS，HLS通常会默认对结构体使用data_pack，但是显式使用data_pack，是更稳妥的做法。

例如我们常见的ap_axiu。

template<int D,int U,int TI,int TD>
struct ap_axiu{
	ap_uint<D> data;
	ap_uint<D/8> keep;
	ap_uint<D/8> strb;
	ap_uint<U> user;
	ap_uint<1> last;
	ap_uint<TI> id;
	ap_uint<TD> dest;
};

来看一个使用的例子。

void example(ap_axiu<32,2,5,6> A[50], ap_axiu<32,2,5,6> B[50])
{
//Map ports to Vivado HLS interfaces
#pragma HLS INTERFACE axis port=A
//#pragma HLS DATA_PACK variable=A
#pragma HLS INTERFACE axis port=B
//#pragma HLS DATA_PACK variable=B
	...
}

形参是一个结构体对象数组，被约束为AXIS，HLS默认使用了data_pack，使结构体对象数组的单一元素的所有成员被实现为一个位向量。

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
再看看AXI4接口约束。
You can use an AXI4 master interface on array or pointer/reference arguments,
如果要使用m_axi接口约束，对应的形参，必须是数组或者指针。如果是C++编码，也可以是引用名。
如果希望HLS将m_axi接口的读写访问，理解为burst mode，建议使用memcpy函数来完成。
例如：

void example(volatile int *a){
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE s_axilite port=return
	//Port a is assigned to an AXI4 master interface
	int i;
	int buff[50];
	
	//memcpy creates a burst access to memory
	memcpy(buff,(const int*)a,50*sizeof(int));
	
	for(i=0; i < 50; i++){
		buff[i] = buff[i] + 100;
	}
	memcpy((int *)a,buff,50*sizeof(int));
	/*
	for(i=0; i < 50; i++){
	#pragma HLS PIPELINE
		a[i] = buff[i];
	}
	*/
	
}

如果不想用memcpy，也可以使用for循环，但是有一些限制条件，
整个拷贝操作，必须完全pipeline，读写地址单调递增，读写操作，不能有条件控制。
换句话说，for循环，必须表现的和memcpy的行为一致，从而让HLS能够理解，这是一个memcpy行为。

如果使用了m_axi接口约束，通常会使用offset=slave选项。
这时，需要额外附加一条对port的接口约束，将它绑定到s_axilite接口上。

#pragma HLS INTERFACE m_axi port=a depth=50 offset=slave
#pragma HLS INTERFACE s_axilite port=a bundle=AXI_Lite_1

+++++++++++++++++++++++++++++++++++++++++++
再来看看SOC中，如何使RAM进行streamlize?

CPU只能访问RAM，即m_axi接口，CPU不能直接访问AXIS接口，
而大部分的HLS模块，会使用AXIS接口接收数据，这样，就需要一个bridge，能够在RAM和stream之间互相转换。
答案就是AXIDMA。

CPU通过设置并启动AXIDMA，使得AXIDMA能够读取RAM数据然后转换成stream发送给HLS模块，后者将接收到的HLS模块发送给AXIDMA的stream，转换成RAM数据，然后写入对应RAM地址。

int init_dma(XAxiDma *axiDmaPtr){
   XAxiDma_Config *CfgPtr;
   int status;
  	
  	// Get pointer to DMA configuration
   CfgPtr = XAxiDma_LookupConfig(XPAR_AXIDMA_0_DEVICE_ID);
   if(!CfgPtr){
      print("Error looking for AXI DMA config\n\r");
      return XST_FAILURE;
   }
   // Initialize the DMA handle
   status = XAxiDma_CfgInitialize(axiDmaPtr,CfgPtr);
   if(status != XST_SUCCESS){
      print("Error initializing DMA\n\r");
      return XST_FAILURE;
   }
   
   //check for scatter gather mode - this example must have simple mode only
   if(XAxiDma_HasSg(axiDmaPtr)){
      print("Error DMA configured in SG mode\n\r");
      return XST_FAILURE;
   }
   //disable the interrupts
   XAxiDma_IntrDisable(axiDmaPtr, XAXIDMA_IRQ_ALL_MASK,XAXIDMA_DEVICE_TO_DMA);
   XAxiDma_IntrDisable(axiDmaPtr, XAXIDMA_IRQ_ALL_MASK,XAXIDMA_DMA_TO_DEVICE);

   return XST_SUCCESS;
}

这个函数使用了API中提供的相关函数，对AXIDMA进行初始化。

如果要启动一次streamlize，如下代码所示：

	// *IMPORTANT* - flush contents of 'realdata' from data cache to memory
   	// before DMA. Otherwise DMA is likely to get stale or uninitialized data
	Xil_DCacheFlushRange((unsigned)realdata, 4 * REAL_FFT_LEN * sizeof(short));
	
   	// DMA enough data to push out first result data set completely
   	status = XAxiDma_SimpleTransfer(&axiDma, (u32)realdata,
		   										4 * REAL_FFT_LEN * sizeof(short), XAXIDMA_DMA_TO_DEVICE);
   	status = XAxiDma_SimpleTransfer(&axiDma, (u32)realdata,
		   										4 * REAL_FFT_LEN * sizeof(short), XAXIDMA_DMA_TO_DEVICE);
	...

在启动DMA之前，首先要刷一次Dcache，将cache中的数据刷到SDRAM中。更新RAM。
因为在有cache的CPU运行上下文中，写入操作并不是真的写入了SDRAM，而是写入到了cache中。
所以，在启动DMA运行前，要将cache的数据刷到RAM中，完成cache和RAM的同步。

然后使用XAxiDma_SimpleTransfer函数，配置DMA，让DMA自动读取RAM中的数据，转换成stream。注意这里使用的标志是XAXIDMA_DMA_TO_DEVICE。
这里连续调用了两次XAxiDma_SimpleTransfer函数。如果还有更多的传输事务需要执行，完全可以调用N次。
每调用一次，就是给DMA设置一个传输事务。实际上，就是给DMA配置了一个传输事务描述块。

如果要启动一次ramlize，如下代码所示：

		// Setup DMA from PL to PS memory using AXI DMA's 'simple' transfer mode
	   	status = XAxiDma_SimpleTransfer(&axiDma, (u32)realspectrum,
			   									REAL_FFT_LEN / 2 * sizeof(complex16), XAXIDMA_DEVICE_TO_DMA);
      	// Poll the AXI DMA core
	   	do {
		   	status = XAxiDma_Busy(&axiDma, XAXIDMA_DEVICE_TO_DMA);
	   	} while(status);
	   	// Data cache must be invalidated for 'realspectrum' buffer after DMA
	   	Xil_DCacheInvalidateRange((unsigned)realspectrum,
	   											REAL_FFT_LEN / 2 * sizeof(complex16));

使用XAxiDma_SimpleTransfer函数，配置DMA，让DMA自动将stream数据存入RAM。注意这里使用的标志是XAXIDMA_DEVICE_TO_DMA。
这里没有多次调用DMA，因为我们想依次处理接收到的数据。

为了和DMA的执行同步，使用了
do{}while;
来实现自旋等待。
在do-while中，不断运行XAxiDma_Busy函数，进行状态测试，
如果完成，则status为0，满足while的跳出条件，
如果没有完成，则status为1，不满足while的跳出条件，继续测试。
如此循环，不断测试状态，直到完成才跳出。

在使用DMA传到SDRAM中的数据之前，还需要更新cache。
Xil_DCacheInvalidateRange函数，完成这个功能。通过使cache失效，强迫下次再读取RAM数据时，必须加载cache。
同样的，在有cache的CPU运行上下文中，读取操作并不是真的读取SDRAM，而是读取cache中的数据。
所以，在DMA运行后，RAM中的数据比cache中的新，所以要使cache失效，完成cache和RAM的同步。
++++++++++++++++++++++++++++++++++++++++++++++++

Huskar_Liu

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
HLS第三十四课（UG871，HLS IP接入SOC系统）

能够和其他模块通过AXIS互联的接口，需要使用接口约束，axis。能够和DMA通过AXI4互联的接口，需要使用接口约束，m_axi。能够接入SOC系统的，需要使用接口约束，s_axilite。下面先来看看s_axilite接口约束。+++++++++++++++++++++++++++++++++++++++++++++通过s_axilite接口约束的port，可以和ARM互联，通常用于配置通信，或者返回值通信。void hls_macc(int a, int b, int *accum,
复制链接

扫一扫