在FPGA里面,AXI DMA这个IP核的主要作用,就是在Verilog语言和C语言之间传输大批量的数据,使用的通信协议为AXI4-Stream。
Xilinx很多IP核都是基于AXI4-Stream协议的,例如浮点数Floating-point IP核,以及以太网Tri Mode Ethernet MAC IP核。要想将Verilog层面的数据搬运到C语言里面处理,就要使用DMA IP核。
本文以浮点数Floating-point IP核将定点数转换为浮点数为例,详细讲解AXI DMA IP核的使用方法。
浮点数IP核的输入输出数据都是32位,协议均为AXI4-Stream。C语言程序首先将要转换的定点数数据通过DMA发送给浮点数IP核,浮点数IP核转换完成后再通过DMA将单精度浮点数结果发回C语言程序,再通过printf打印出来。
定点数的数据类型为int,小数点定在第四位上,即:XXXXXXX.X。整数部分占28位,小数部分占4位。
转换后浮点数的数据类型为float,可以用printf的%f直接打印出来。
工程下载地址:https://pan.baidu.com/s/1SXppHMdhroFT8vGCIysYTQ(提取码:u7wf)
MicroBlaze C语言工程的建法不再赘述,请参阅:https://blog.csdn.net/ZLK1214/article/details/111824576
首先添加Floating-point IP核,作为DMA的外设端:(主存端为BRAM)
这里要注意一下,一定要勾选上TLAST,否则DMA接收端会出现DMA Internal Error的错误:
下面是Xilinx DMA手册里面对DMA Internal Error错误的描述:
添加AXI DMA IP核:
IP核添加好了,但还没有连线:
点击Run Connection Automation,自动连接DMA的S_AXI_LITE接口:
自动连接浮点数IP核的时钟引脚:
添加BRAM控制器:
最终的连线结果:
修改新建的BRAM的容量为64KB:
最终的地址分配方式:
保存Block Design,然后生成Bitstream:
Bitstream生成后,导出xsa文件:
Vitis Platform工程重新导入xsa文件:
修改C程序(helloworld.c)的代码:
(这里面XPAR_BRAM_2_BASEADDR最好改成0xc0000000,因为生成的xparameters.h配置文件里面BRAM号可能有变化)
/******************************************************************************
*
* Copyright (C) 2009 - 2014 Xilinx, Inc. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* Use of the Software is limited solely to applications:
* (a) running on a Xilinx device, or
* (b) that interact with a Xilinx device through a bus or interconnect.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* XILINX BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Except as contained in this notice, the name of the Xilinx shall not be used
* in advertising or otherwise to promote the sale, use or other dealings in
* this Software without prior written authorization from Xilinx.
*
******************************************************************************/
/*
* helloworld.c: simple test application
*
* This application configures UART 16550 to baud rate 9600.
* PS7 UART (Zynq) is not initialized by this application, since
* bootrom/bsp configures it to baud rate 115200
*
* ------------------------------------------------
* | UART TYPE BAUD RATE |
* ------------------------------------------------
* uartns550 9600
* uartlite Configurable only in HW design
* ps7_uart 115200 (configured by bootrom/bsp)
*/
#include <stdio.h>
#include <xaxidma.h>
#include "platform.h"
// DMA无法通过AXI Interconnect访问Microblaze本身的BRAM内存
// 只能访问挂接在AXI Interconnect上的内存
#define _countof(arr) (sizeof(arr) / sizeof(*(arr)))
typedef struct
{
int numbers_in[40];
float numbers_out[40];
} BRAM2_Data;
static BRAM2_Data *bram2_data = (BRAM2_Data *)XPAR_BRAM_2_BASEADDR;
static XAxiDma xaxidma;
int main(void)
{
int i, ret = 0;
XAxiDma_Config *xaxidma_cfg;
init_platform();
printf("Hello World\n");
printf("Successfully ran Hello World application\n");
// 初始化DMA
xaxidma_cfg = XAxiDma_LookupConfig(XPAR_AXIDMA_0_DEVICE_ID);
XAxiDma_CfgInitialize(&xaxidma, xaxidma_cfg);
ret = XAxiDma_Selftest(&xaxidma);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_Selftest() failed! ret=%d\n", ret);
goto err;
}
// 初始化DMA的输入数据
printf("numbers_in=%p, numbers_out=%p\n", bram2_data->numbers_in, bram2_data->numbers_out);
for (i = 0; i < _countof(bram2_data->numbers_in); i++)
{
bram2_data->numbers_in[i] = 314 * (i + 1);
if (i & 1)
bram2_data->numbers_in[i] = -bram2_data->numbers_in[i];
}
// DMA开始发送数据 (Length参数的单位为字节)
ret = XAxiDma_SimpleTransfer(&xaxidma, (uintptr_t)bram2_data->numbers_in, sizeof(bram2_data->numbers_in), XAXIDMA_DMA_TO_DEVICE);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_SimpleTransfer(XAXIDMA_DMA_TO_DEVICE) failed! ret=%d\n", ret);
goto err;
}
// DMA开始接收数据
ret = XAxiDma_SimpleTransfer(&xaxidma, (uintptr_t)bram2_data->numbers_out, sizeof(bram2_data->numbers_out), XAXIDMA_DEVICE_TO_DMA);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_SimpleTransfer(XAXIDMA_DEVICE_TO_DMA) failed! ret=%d\n", ret);
goto err;
}
// 等待DMA发送完毕
i = 0;
while (XAxiDma_Busy(&xaxidma, XAXIDMA_DMA_TO_DEVICE))
{
i++;
if (i == 200000)
{
// 必须确保DMA访问的内存是直接挂接在AXI Interconnect上的
// 否则这里会报DMA Decode Error的错误 (the address request points to an invalid address)
printf("DMA Tx timeout! DMASR=0x%08lx\n", XAxiDma_ReadReg(xaxidma.RegBase + XAXIDMA_TX_OFFSET, XAXIDMA_SR_OFFSET));
goto err;
}
}
printf("DMA Tx complete!\n");
// 等待DMA接收完毕
i = 0;
while (XAxiDma_Busy(&xaxidma, XAXIDMA_DEVICE_TO_DMA))
{
i++;
if (i == 200000)
{
// floating-point IP核的配置里面一定要把A通道的tlast复选框勾选上, 使输入端和输出端都有tlast信号
// 否则s_axis_s2mm_tlast一直为0, DMA以为数据还没接收完, 就会报DMA Internal Error的错误
// (the incoming packet is bigger than what is specified in the DMA length register)
printf("DMA Rx timeout! DMASR=0x%08lx\n", XAxiDma_ReadReg(xaxidma.RegBase + XAXIDMA_RX_OFFSET, XAXIDMA_SR_OFFSET));
goto err;
}
}
printf("DMA Rx complete!\n");
err:
for (i = 0; i < _countof(bram2_data->numbers_out); i++)
printf("numbers_out[%d]=%f\n", i, bram2_data->numbers_out[i]);
cleanup_platform();
return 0;
}
C程序的运行结果:
Hello World
Successfully ran Hello World application
numbers_in=0xc0000000, numbers_out=0xc00000a0
DMA Tx complete!
DMA Rx complete!
numbers_out[0]=19.625000
numbers_out[1]=-39.250000
numbers_out[2]=58.875000
numbers_out[3]=-78.500000
numbers_out[4]=98.125000
numbers_out[5]=-117.750000
numbers_out[6]=137.375000
numbers_out[7]=-157.000000
numbers_out[8]=176.625000
numbers_out[9]=-196.250000
numbers_out[10]=215.875000
numbers_out[11]=-235.500000
numbers_out[12]=255.125000
numbers_out[13]=-274.750000
numbers_out[14]=294.375000
numbers_out[15]=-314.000000
numbers_out[16]=333.625000
numbers_out[17]=-353.250000
numbers_out[18]=372.875000
numbers_out[19]=-392.500000
numbers_out[20]=412.125000
numbers_out[21]=-431.750000
numbers_out[22]=451.375000
numbers_out[23]=-471.000000
numbers_out[24]=490.625000
numbers_out[25]=-510.250000
numbers_out[26]=529.875000
numbers_out[27]=-549.500000
numbers_out[28]=569.125000
numbers_out[29]=-588.750000
numbers_out[30]=608.375000
numbers_out[31]=-628.000000
numbers_out[32]=647.625000
numbers_out[33]=-667.250000
numbers_out[34]=686.875000
numbers_out[35]=-706.500000
numbers_out[36]=726.125000
numbers_out[37]=-745.750000
numbers_out[38]=765.375000
numbers_out[39]=-785.000000
接下来讲一下我们刚才禁用掉的Scatter Gather接口的用法。取消禁用后,之前的C代码就不能运行了。
之前没有启用Scatter Gather的时候,我们一次只能提交一个DMA请求,等这个DMA请求的数据传输完毕后,我们才能提交下一个DMA传输请求。
有了Scatter Gather接口,我们就可以一次性提交很多很多DMA请求,然后CPU去干其他的事情。这可以大大提高传输效率。
除此以外,Scatter Gather还可以将多个位于不同内存地址的缓冲区合并成一个AXI4-Stream数据包传输。
下面的示例演示了如何利用Scatter Gather功能批量收发3组数据包。
启用了Scatter Gather后,DMA里面多出了一个M_AXI_SG接口,点击Run Connection Automation,连接到AXI Interconnect上:
Vivado工程Generate Bitstream,然后导出xsa文件。回到Vitis后,必须把Platform工程删了重建,不然XPAR_AXI_DMA_0_INCLUDE_SG的值得不到更新。
原有的C程序不再可用,修改一下程序代码:
/******************************************************************************
*
* Copyright (C) 2009 - 2014 Xilinx, Inc. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* Use of the Software is limited solely to applications:
* (a) running on a Xilinx device, or
* (b) that interact with a Xilinx device through a bus or interconnect.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* XILINX BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
* OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
* Except as contained in this notice, the name of the Xilinx shall not be used
* in advertising or otherwise to promote the sale, use or other dealings in
* this Software without prior written authorization from Xilinx.
*
******************************************************************************/
/*
* helloworld.c: simple test application
*
* This application configures UART 16550 to baud rate 9600.
* PS7 UART (Zynq) is not initialized by this application, since
* bootrom/bsp configures it to baud rate 115200
*
* ------------------------------------------------
* | UART TYPE BAUD RATE |
* ------------------------------------------------
* uartns550 9600
* uartlite Configurable only in HW design
* ps7_uart 115200 (configured by bootrom/bsp)
*/
#include <stdio.h>
#include <xaxidma.h>
#include "platform.h"
/* Xilinx的官方例程:C:\Xilinx\Vitis\2020.1\data\embeddedsw\XilinxProcessorIPLib\drivers\axidma_v9_11\examples\xaxidma_example_sg_poll.c */
// DMA无法通过AXI Interconnect访问Microblaze本身的BRAM内存
// 只能访问挂接在AXI Interconnect上的内存
#define _countof(arr) (sizeof(arr) / sizeof(*(arr)))
typedef struct
{
int numbers_in[40];
float numbers_out[40];
} BRAM2_Data;
typedef struct
{
uint8_t txbuf[640];
uint8_t rxbuf[640];
} BRAM2_BdRingBuffer;
static BRAM2_Data *bram2_data = (BRAM2_Data *)0xc0000000;
static BRAM2_BdRingBuffer *bram2_bdringbuf = (BRAM2_BdRingBuffer *)0xc0008000;
static XAxiDma xaxidma;
int main(void)
{
int i, n, ret = 0;
XAxiDma_Bd *bd, *p;
XAxiDma_BdRing *txring, *rxring;
XAxiDma_Config *cfg;
init_platform();
printf("Hello World\n");
printf("Successfully ran Hello World application\n");
// 初始化DMA
cfg = XAxiDma_LookupConfig(XPAR_AXIDMA_0_DEVICE_ID);
XAxiDma_CfgInitialize(&xaxidma, cfg);
ret = XAxiDma_Selftest(&xaxidma);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_Selftest() failed! ret=%d\n", ret);
goto err;
}
if (!XAxiDma_HasSg(&xaxidma))
{
printf("XPAR_AXI_DMA_0_INCLUDE_SG=%d\n", XPAR_AXI_DMA_0_INCLUDE_SG);
printf("Please recreate and build Vitis platform project!\n");
goto err;
}
// 初始化DMA的输入数据
printf("[0] numbers_in=%p, numbers_out=%p\n", bram2_data[0].numbers_in, bram2_data[0].numbers_out);
printf("[1] numbers_in=%p, numbers_out=%p\n", bram2_data[1].numbers_in, bram2_data[1].numbers_out);
printf("[2] numbers_in=%p, numbers_out=%p\n", bram2_data[2].numbers_in, bram2_data[2].numbers_out);
for (i = 0; i < _countof(bram2_data[0].numbers_in); i++)
{
bram2_data[0].numbers_in[i] = 314 * (i + 1);
bram2_data[1].numbers_in[i] = -141 * (i + 1);
bram2_data[2].numbers_in[i] = -2718 * (i + 1);
if (i & 1)
{
bram2_data[0].numbers_in[i] = -bram2_data[0].numbers_in[i];
bram2_data[1].numbers_in[i] = -bram2_data[1].numbers_in[i];
bram2_data[2].numbers_in[i] = -bram2_data[2].numbers_in[i];
}
}
// 配置DMA发送描述符
txring = XAxiDma_GetTxRing(&xaxidma);
n = XAxiDma_BdRingCntCalc(XAXIDMA_BD_MINIMUM_ALIGNMENT, sizeof(bram2_bdringbuf->txbuf));
ret = XAxiDma_BdRingCreate(txring, (uintptr_t)bram2_bdringbuf->txbuf, (uintptr_t)bram2_bdringbuf->txbuf, XAXIDMA_BD_MINIMUM_ALIGNMENT, n);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingCreate(txring) failed! ret=%d\n", ret);
goto err;
}
printf("BdRing Tx count: %d\n", n);
ret = XAxiDma_BdRingAlloc(txring, 3, &bd);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingAlloc(txring) failed! ret=%d\n", ret);
goto err;
}
p = bd;
for (i = 0; i < 3; i++)
{
XAxiDma_BdSetBufAddr(p, (uintptr_t)bram2_data[i].numbers_in);
XAxiDma_BdSetLength(p, sizeof(bram2_data[i].numbers_in), txring->MaxTransferLen);
XAxiDma_BdSetCtrl(p, XAXIDMA_BD_CTRL_TXSOF_MASK | XAXIDMA_BD_CTRL_TXEOF_MASK);
XAxiDma_BdSetId(p, i);
p = (XAxiDma_Bd *)XAxiDma_BdRingNext(txring, p);
}
ret = XAxiDma_BdRingToHw(txring, 3, bd);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingToHw(txring) failed! ret=%d\n", ret);
goto err;
}
// 配置DMA接收描述符
rxring = XAxiDma_GetRxRing(&xaxidma);
n = XAxiDma_BdRingCntCalc(XAXIDMA_BD_MINIMUM_ALIGNMENT, sizeof(bram2_bdringbuf->rxbuf));
ret = XAxiDma_BdRingCreate(rxring, (uintptr_t)bram2_bdringbuf->rxbuf, (uintptr_t)bram2_bdringbuf->rxbuf, XAXIDMA_BD_MINIMUM_ALIGNMENT, n);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingCreate(rxring) failed! ret=%d\n", ret);
goto err;
}
printf("BdRing Rx count: %d\n", n);
ret = XAxiDma_BdRingAlloc(rxring, 3, &bd);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingAlloc(rxring) failed! ret=%d\n", ret);
goto err;
}
p = bd;
for (i = 0; i < 3; i++)
{
XAxiDma_BdSetBufAddr(p, (uintptr_t)bram2_data[i].numbers_out);
XAxiDma_BdSetLength(p, sizeof(bram2_data[i].numbers_out), rxring->MaxTransferLen);
XAxiDma_BdSetCtrl(p, 0);
XAxiDma_BdSetId(p, i);
p = (XAxiDma_Bd *)XAxiDma_BdRingNext(rxring, p);
}
ret = XAxiDma_BdRingToHw(rxring, 3, bd);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingToHw(rxring) failed! ret=%d\n", ret);
goto err;
}
// 开始发送数据
ret = XAxiDma_BdRingStart(txring);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingStart(txring) failed! ret=%d\n", ret);
goto err;
}
// 开始接收数据
ret = XAxiDma_BdRingStart(rxring);
if (ret != XST_SUCCESS)
{
printf("XAxiDma_BdRingStart(rxring) failed! ret=%d\n", ret);
goto err;
}
// 等待收发结束
n = 0;
while (n < 6)
{
// 检查发送是否结束
ret = XAxiDma_BdRingFromHw(txring, XAXIDMA_ALL_BDS, &bd);
if (ret != 0)
{
n += ret;
p = bd;
for (i = 0; i < ret; i++)
{
printf("DMA Tx%lu Complete!\n", XAxiDma_BdGetId(p));
p = (XAxiDma_Bd *)XAxiDma_BdRingNext(txring, p);
}
ret = XAxiDma_BdRingFree(txring, ret, bd);
if (ret != XST_SUCCESS)
printf("XAxiDma_BdRingFree(txring) failed! ret=%d\n", ret);
}
// 检查接收是否结束
ret = XAxiDma_BdRingFromHw(rxring, XAXIDMA_ALL_BDS, &bd);
if (ret != 0)
{
n += ret;
p = bd;
for (i = 0; i < ret; i++)
{
printf("DMA Rx%lu Complete!\n", XAxiDma_BdGetId(p));
p = (XAxiDma_Bd *)XAxiDma_BdRingNext(rxring, p);
}
ret = XAxiDma_BdRingFree(rxring, ret, bd);
if (ret != XST_SUCCESS)
printf("XAxiDma_BdRingFree(rxring) failed! ret=%d\n", ret);
}
}
err:
for (i = 0; i < _countof(bram2_data[0].numbers_out); i++)
printf("numbers_out[%d]=%f,%f,%f\n", i, bram2_data[0].numbers_out[i], bram2_data[1].numbers_out[i], bram2_data[2].numbers_out[i]);
cleanup_platform();
return 0;
}
程序运行结果:
Hello World
Successfully ran Hello World application
[0] numbers_in=0xc0000000, numbers_out=0xc00000a0
[1] numbers_in=0xc0000140, numbers_out=0xc00001e0
[2] numbers_in=0xc0000280, numbers_out=0xc0000320
BdRing Tx count: 10
BdRing Rx count: 10
DMA Tx0 Complete!
DMA Tx1 Complete!
DMA Tx2 Complete!
DMA Rx0 Complete!
DMA Rx1 Complete!
DMA Rx2 Complete!
numbers_out[0]=19.625000,-8.812500,-169.875000
numbers_out[1]=-39.250000,17.625000,339.750000
numbers_out[2]=58.875000,-26.437500,-509.625000
numbers_out[3]=-78.500000,35.250000,679.500000
numbers_out[4]=98.125000,-44.062500,-849.375000
numbers_out[5]=-117.750000,52.875000,1019.250000
numbers_out[6]=137.375000,-61.687500,-1189.125000
numbers_out[7]=-157.000000,70.500000,1359.000000
numbers_out[8]=176.625000,-79.312500,-1528.875000
numbers_out[9]=-196.250000,88.125000,1698.750000
numbers_out[10]=215.875000,-96.937500,-1868.625000
numbers_out[11]=-235.500000,105.750000,2038.500000
numbers_out[12]=255.125000,-114.562500,-2208.375000
numbers_out[13]=-274.750000,123.375000,2378.250000
numbers_out[14]=294.375000,-132.187500,-2548.125000
numbers_out[15]=-314.000000,141.000000,2718.000000
numbers_out[16]=333.625000,-149.812500,-2887.875000
numbers_out[17]=-353.250000,158.625000,3057.750000
numbers_out[18]=372.875000,-167.437500,-3227.625000
numbers_out[19]=-392.500000,176.250000,3397.500000
numbers_out[20]=412.125000,-185.062500,-3567.375000
numbers_out[21]=-431.750000,193.875000,3737.250000
numbers_out[22]=451.375000,-202.687500,-3907.125000
numbers_out[23]=-471.000000,211.500000,4077.000000
numbers_out[24]=490.625000,-220.312500,-4246.875000
numbers_out[25]=-510.250000,229.125000,4416.750000
numbers_out[26]=529.875000,-237.937500,-4586.625000
numbers_out[27]=-549.500000,246.750000,4756.500000
numbers_out[28]=569.125000,-255.562500,-4926.375000
numbers_out[29]=-588.750000,264.375000,5096.250000
numbers_out[30]=608.375000,-273.187500,-5266.125000
numbers_out[31]=-628.000000,282.000000,5436.000000
numbers_out[32]=647.625000,-290.812500,-5605.875000
numbers_out[33]=-667.250000,299.625000,5775.750000
numbers_out[34]=686.875000,-308.437500,-5945.625000
numbers_out[35]=-706.500000,317.250000,6115.500000
numbers_out[36]=726.125000,-326.062500,-6285.375000
numbers_out[37]=-745.750000,334.875000,6455.250000
numbers_out[38]=765.375000,-343.687500,-6625.125000
numbers_out[39]=-785.000000,352.500000,6795.000000