Zynq UltraScale+ MPSoC使用HLS编译的IP核
前言
这几天板子老师给邮寄到家了,编一个图像处理的IP核,反反复复遇到了很多问题,今天记录下
开始
编译写自己的程序,然后模板如下:
void HLS_accel (AXI_VAL INPUT_STREAM[IMG_LEN], AXI_VAL OUTPUT_STREAM[64])
{
#pragma HLS INTERFACE s_axilite port=return bundle=CONTROL_BUS
#pragma HLS INTERFACE axis port=OUTPUT_STREAM
#pragma HLS INTERFACE axis port=INPUT_STREAM
wrapped_lc_model_hw(INPUT_STREAM, OUTPUT_STREAM);
return;
}
这里的wrapped_lc_model_hw其实自己写的算法,然后用来转化,这里注意用AXI总线的话输入输出接口一定定义为axis或者axi啥的,注意定义接口,然后展开自己的算法
void wrapped_lc_model_hw(AXI_VAL in_stream[IMG_LEN],AXI_VAL out_stream[64])
{
#pragma HLS INLINE
int in[64*64];
int out[64];
int i =0;
for(i=0; i<IMG_LEN; i++)
{
#pragma HLS PIPELINE
in[i] = pop_stream<int,4,5,5>(in_stream[i]);
}
model_kernel(in, out);
for(i = 0; i < 64; i++){
#pragma HLS PIPELINE II=1
out_stream[i] = push_stream<int,4,5,5>(out[i],i == (64-1));
}
return;
}
首先就是从AXI_stream接口读取出图像数据来,然后送入自己的kernel处理,处理完push出去
然后编译生成IP,优化过程不讲了,参考哔哩哔哩的赛灵思hls视频
生成IP
生成IP后要和zynq进行通路搭建,这里给出我的如下:
然后这里注意在address editor种把OCM给unmap:
然后生成比特流,导出sdk
SDK编程
这里主要是把内存的数据搬用到ip种,注意官方demo的size太小,dma运输时太大会导致运输失败,所以改进如下:
#define ONE_TRANS_NUM (4096*2)
int Run_LC_MODEL_Accelerator(int src[DIM*DIM], int roi_out[64], int dma_size)
{
int tick = dma_size / ONE_TRANS_NUM;
int status = 0;
for(int i = 0; i<tick;i++) {
status = XAxiDma_SimpleTransfer(&AxiDma, (unsigned int) src + (i*ONE_TRANS_NUM), ONE_TRANS_NUM , XAXIDMA_DMA_TO_DEVICE);
if (status != XST_SUCCESS) {
print("Error: DMA transfer to Vivado HLS block failed1\n");
return XST_FAILURE;
}
/* Wait for transfer to be done */
while (XAxiDma_Busy(&AxiDma, XAXIDMA_DMA_TO_DEVICE)) ;
}
if(dma_size % ONE_TRANS_NUM){
status = XAxiDma_SimpleTransfer(&AxiDma, (unsigned int) src + (tick*ONE_TRANS_NUM), dma_size % ONE_TRANS_NUM , XAXIDMA_DMA_TO_DEVICE);
if (status != XST_SUCCESS) {
print("Error: DMA transfer to Vivado HLS block failed1\n");
return XST_FAILURE;
}
/* Wait for transfer to be done */
while (XAxiDma_Busy(&AxiDma, XAXIDMA_DMA_TO_DEVICE)) ;
}
//transfer B to the Vivado HLS block
/* status = XAxiDma_SimpleTransfer(&AxiDma, (unsigned int) (src+dma_size/2), dma_size/2, XAXIDMA_DMA_TO_DEVICE);
if (status != XST_SUCCESS) {
print("Error: DMA transfer to Vivado HLS block failed2\n");
return XST_FAILURE;
}*/
/* Wait for transfer to be done */
while (XAxiDma_Busy(&AxiDma, XAXIDMA_DMA_TO_DEVICE)) ;
//get results from the Vivado HLS block
status = XAxiDma_SimpleTransfer(&AxiDma, (unsigned int) roi_out, 64*sizeof(int),
XAXIDMA_DEVICE_TO_DMA);
if (status != XST_SUCCESS) {
print("Error: DMA transfer from Vivado HLS block failed\n");
return XST_FAILURE;
}
/* Wait for transfer to be done */
while (XAxiDma_Busy(&AxiDma, XAXIDMA_DMA_TO_DEVICE)) ;
while ((XAxiDma_Busy(&AxiDma, XAXIDMA_DEVICE_TO_DMA)) || (XAxiDma_Busy(&AxiDma, XAXIDMA_DMA_TO_DEVICE))) ;
// while (!ResultExample)
// print("Waiting for core to finish\n\r");
print("trans ok!");
return 0;
}
然后得到结果和电脑的对比。