上一篇博客采用了直接操作寄存器的方式,本博客采用外部开源库的方法。该方法在很多博客当中都有介绍,如下参考链接,这里重点说一下使用这个方法过程中要注意的点。
采用AXI-DMA 开源库实现
参考:https://blog.csdn.net/sements/article/details/90230188
https://blog.csdn.net/baidu_15814023/article/details/105650711
下载这个仓库文件后,移动到xilinx_axidma-master源码目录,编译驱动
make CROSS_COMPILE=/usr/local/gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf- ARCH=arm KBUILD_DIR=/home/francis/linux-xlnx driver
然后再编译文件夹里面的 examples
make CROSS_COMPILE=/usr/local/gcc-linaro-7.5.0-2019.12-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf- ARCH=arm examples
最后,在outputs子目录下生成所有文件. 其中里面的 .so 动态库文件需要拷贝到开发版的 /usr/lib/下
改写设备树的pl.dtsi文件。注意:该设备树文件里面的 amba_pl --> axi_dma_0 节点是由 PL端使用AXI 总线建立的两个DMA 通道,在该节点下再增加内容
/*
* CAUTION: This file is automatically generated by Xilinx.
* Version:
* Today is: Wed Oct 28 12:02:33 2020
*/
/ {
amba_pl: amba_pl {
#address-cells = <1>;
#size-cells = <1>;
compatible = "simple-bus";
ranges ;
axi_dma_0: dma@40400000 {
#dma-cells = <1>;
clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>;
compatible = "xlnx,axi-dma-1.00.a";
interrupt-parent = <&intc>;
interrupts = <0 29 4 0 30 4>;
reg = <0x40400000 0x10000>;
xlnx,addrwidth = <0x20>;
dma-channel@40400000 {
compatible = "xlnx,axi-dma-mm2s-channel";
dma-channels = <0x1>;
interrupts = <0 29 4>;
xlnx,datawidth = <0x20>;
xlnx,device-id = <0x0>;
};
dma-channel@40400030 {
compatible = "xlnx,axi-dma-s2mm-channel";
dma-channels = <0x1>;
interrupts = <0 30 4>;
xlnx,datawidth = <0x20>;
xlnx,device-id = <0x1>;
};
};
axidma_chrdev@0{
compatible = "xlnx,axidma-chrdev";
dmas = <&axi_dma_0 0 &axi_dma_0 1>;
dma-names = "tx_channel", "rx_channel";
};
};
};
将对应的文件scp到开发板,新建hello.c 文件并随便写入内容,执行出现结果
注意:直接这种方式,第一次可以成功,但是第二次到以后可能就出现了time out异常卡死,所以这个原版的程序还是有问题的,因此要再次执行程序,要先卸载驱动再加载之。
驱动说明和注意要点
1 关于要注意的事项,可参考上面参考链接里面博主所说的,里面的步骤请依据他们说的步骤,否则可能实现不成功。
2 代码结构解释
// /driver文件夹下
|---> axi_dma.c <---|
| |
被调用|<--- axidma_chrdev.c <---| |
| |
|---> axidma_dma.c --->|被调用 --->|被调用
|
被调用|<--- axidma_of.c
axidma_mod.c 杂项
// example文件夹下不用管,都是test例子
// library文件夹下是需要的源文件文件,可编译成库文件
libaxidma.c // 里面的函数是我们在代码中所需要的
使用 axidma_oneway_transfer 时,当最后一个传递的wait参数:
设置为true,虽是阻塞但有超时退出,在我们程序中尽量不要用这个方法;
设置为flase,可在程序中使用信号量进行线程切换。
3 用户空间采用 /dev/mem 这个内核层暴露出来的接口进行映射,将实际的物理地址传入获取对应的虚拟地址,获取对应的地址空间,可保持和FPGA的关键数据(少量)的交互。这些关键数据比如可以是arm告诉fpga那边arm已经取得fpga发来的数据了,fpga可以进行下一个循环。
这里有人会说既然有 mmap(“/dev/mem,**”) 这种方式,不是就可以直接实现arm与fpga的数据交互了码?
非也,
a) mmap映射的内存区域可能会不连续,这里的不连续是指实际的物理内存不连续,而虚拟地址是连续的,在arm端就可能无法将正确的数据从内存中读写出来;
b) mmap区域只是映射这段内存,只是可以在用户层直接访问这段内存而已,而不是内存分配,他不能保证板子上的各种程序也会使用这段内存,而改写相关内容。
另外,在使用 mmap 进行映射时,返回的地址(虚拟地址)是 unsigned long,而非 unsigned int,否则下面对该映射地址下操作(寄存器)读写有问题,原因未知。
4 FPGA端在使用adc采集数据时,arm端要关掉linux内核中的有关xadc宏选项。
不然测试时,有可能在先加载pl端程序时,ps端卡死,猜想可能是占用了ps端的资源,或者共享资源造成的无限等待。
所以这种情况尽量要重写编译内核,去掉有关xadc宏选项
5 在测试整体的adc和数据处理DMA通道的过程中发现,设备树中DMA通道的顺序会影响代码执行
(下面的代码已经测试成功),比如对于代码而言:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <getopt.h>
#include <errno.h>
#include <pthread.h>
#include <semaphore.h>
#include <sys/mman.h>
#include <math.h>
#include <signal.h>
#include "util.h" // Miscellaneous utilities
#include "conversion.h" // Convert bytes to MiBs
#include "libaxidma.h" // Interface ot the AXI DMA library
#define PAGE_SIZE 4096
#define MMAP_SIZE 8192 // 映射的空间大小, PAGE_SIZE的倍数
#define ADC_SIZE 512 // 每个ADC通道要采集的点数
#define ADC_CHANNEL_COUNT 4 // 通道数
// #define ADC_DMA_SIZE 4096
// #define DMA1_SIZE 8192
#define ADC_DMA_SIZE (ADC_SIZE * 2 * ADC_CHANNEL_COUNT) // 2指的是u16
#define DMA1_SIZE (ADC_SIZE * 4 * ADC_CHANNEL_COUNT) // 4指的是float
#define PI 3.1415926
axidma_dev_t axidma_dev;
const array_t *tx_chans, *rx_chans;
float *send_buffer0;
short *recv_buffer_adc;
float *recv_buffer0;
int sig_base;
unsigned long *sig_mem_base;
#define SIG_REGBASE (0x43c00000)
sem_t sem_rx_channel_adc, sem_rx_channel0;
float hanning_win[ADC_SIZE];
void free_memory(int sel)
{
switch (sel)
{
case 0:
axidma_destroy(axidma_dev);
break;
case 1:
axidma_free(axidma_dev, (void *)send_buffer0, MMAP_SIZE);
axidma_destroy(axidma_dev);
break;
case 2:
axidma_free(axidma_dev, (void *)recv_buffer_adc, MMAP_SIZE);
axidma_free(axidma_dev, (void *)send_buffer0, MMAP_SIZE);
axidma_destroy(axidma_dev);
break;
case 3:
axidma_free(axidma_dev, (void *)recv_buffer_adc, MMAP_SIZE);
axidma_free(axidma_dev, (void *)recv_buffer0, MMAP_SIZE);
axidma_free(axidma_dev, (void *)send_buffer0, MMAP_SIZE);
axidma_destroy(axidma_dev);
break;
case 4:
close(sig_base);
axidma_free(axidma_dev, (void *)recv_buffer_adc, MMAP_SIZE);
axidma_free(axidma_dev, (void *)recv_buffer0, MMAP_SIZE);
axidma_free(axidma_dev, (void *)send_buffer0, MMAP_SIZE);
axidma_destroy(axidma_dev);
break;
case 5:
munmap((void *)sig_mem_base, MMAP_SIZE);
close(sig_base);
axidma_free(axidma_dev, (void *)recv_buffer_adc, MMAP_SIZE);
axidma_free(axidma_dev, (void *)recv_buffer0, MMAP_SIZE);
axidma_free(axidma_dev, (void *)send_buffer0, MMAP_SIZE);
axidma_destroy(axidma_dev);
break;
default:
break;
}
}
void sigint_handler()
{
free_memory(5);
printf("exit!\n");
exit(0);
}
/****************************************************
* sig_mem_base 第一个寄存器,DMA控制寄存器
* | 31-8 || 76 54 32 10 |
* | **-**|| ** ** 00 11 | 接收adc数据
* | **-**|| ** ** 00 10 | 关闭
* **************************************************
* | **-**|| ** ** 11 00 | 接收fft数据
* | **-**|| ** ** 10 00 | 关闭
* **************************************************
* sig_mem_base + 1 第二个寄存器,ADC配置寄存器
* | 31-8 || 76 54 32 10 |
* | **-**|| ** ** 11 11 | 开启adc
* | **-**|| ** ** 11 10 | 开启adc(第二次)
* | **-**|| ** ** 10 10 | 关闭adc
****************************************************/
void start_receive_rx_channel_adc()
{
*(volatile unsigned long*)(sig_mem_base) = 0x00000003;
}
void stop_receive_rx_channel_adc()
{
*(volatile unsigned long*)(sig_mem_base) = 0x00000002;
}
void start_receive_rx_channel0()
{
*(volatile unsigned long*)(sig_mem_base) = 0x0000000c;
}
void stop_receive_rx_channel1()
{
*(volatile unsigned long*)(sig_mem_base) = 0x00000008;
}
void init_start_adc()
{
printf("Init and start ADC...\n");
*(volatile unsigned long*)(sig_mem_base + 1) = 0x0000000f;
}
void start_adc()
{
printf("Start ADC ...\n");
*(volatile unsigned long*)(sig_mem_base + 1) = 0x0000000e;
}
void stop_adc()
{
printf("Stop ADC...\n");
*(volatile unsigned long*)(sig_mem_base + 1) = 0x0000000a;
}
void add_offset()
{
for (int i = 0; i < (ADC_SIZE * ADC_CHANNEL_COUNT); i++)
{
if (*(recv_buffer_adc + i) > 2048)
{
*(recv_buffer_adc + i) = *(recv_buffer_adc + i) - 2048;
}
else
{
*(recv_buffer_adc + i) = *(recv_buffer_adc + i) + 2048;
}
}
}
void init_hanning_win()
{
for (int i = 0; i < ADC_SIZE; i++)
{
hanning_win[i] = 0.5-0.5*cos((2 * PI*(i - 1))/(ADC_SIZE - 1));
// if (i%16)
// {
// printf("\n");
// }
// printf("%f ",hanning_win[i]);
}
}
void add_hanning_win()
{
/* i-列, j-行 */
int i,j = 0;
float temp_buffer[ADC_CHANNEL_COUNT][ADC_SIZE];
memset(temp_buffer, 0, sizeof(temp_buffer));
for (i = 0; i < ADC_SIZE; i++)
{
for ( j = 0; j < ADC_CHANNEL_COUNT; j++)
{
temp_buffer[j][i] = hanning_win[i] * (*(recv_buffer_adc + i*ADC_CHANNEL_COUNT+j));
}
}
memcpy(send_buffer0, temp_buffer, sizeof(temp_buffer));
j = 0;
for ( i = 0; i < ADC_SIZE; i++)
{
if (i%8 == 0)
{
printf(" --%d\n", j++);
}
printf("%f ",*(send_buffer0 + i));
}
printf("Add hanning windows done...\n");
}
void callback_rx_channel_adc(int channelid,void* data)
{
static int channel1_count = 0;
int i, j = 0;
axidma_stop_transfer(axidma_dev,rx_chans->data[1]);
stop_adc();
printf("RECVINFO_ADC: callback func triggerd, channelid: %d |------| COUNT: %d \n", channelid, channel1_count);
add_offset();
for(i = 0; i < (ADC_SIZE * ADC_CHANNEL_COUNT); i++)
{
if (i%4 == 0)
{
// printf("\n");
printf(" --%d\n",j++);
}
printf("%d\t", *(recv_buffer_adc + i));
}
channel1_count++;
printf("\nReceived adc buffer successfully.\n");
// add win, 加窗的时候应保证源数据不能改变
add_hanning_win();
// 加完窗, 排序, 给信号,表示:1.新组合的数据要发出去,2.继续接收下一次采集的数据
axidma_oneway_transfer(axidma_dev,tx_chans->data[0],send_buffer0,DMA1_SIZE,false);
}
void callback_tx_channel0(int channelid,void* data)
{
axidma_stop_transfer(axidma_dev,tx_chans->data[0]);
printf("SENDINFO: callback func triggerd,channelid: %d\n",channelid);
// sem_post(&sem_rx_channel_adc);
}
void callback_rx_channel0(int channelid,void* data)
{
static int channel0_count = 0;
int i, j = 0;
printf("RECVINFO_0: callback func triggerd,channelid: %d |------| COUNT: %d \n", channelid, channel0_count);
for(i = 0; i < (ADC_SIZE * ADC_CHANNEL_COUNT); i++)
{
if (i%8 == 0)
{
printf(" --%d\n", j++);
}
printf("%f\t",*(recv_buffer0 + i));
}
channel0_count++;
printf("\nReceived fft data successfully.\n");
// exit(0);
// sem_post(&sem_rx_channel0);
}
void *thread_rx_channel_adc(void *arg)
{
static unsigned int first = 0;
while (1)
{
axidma_oneway_transfer(axidma_dev,rx_chans->data[1],recv_buffer_adc,ADC_DMA_SIZE,false);
start_receive_rx_channel_adc();
if (!first)
{
init_start_adc();
first = 1;
}
else
{
start_adc();
}
sem_wait(&sem_rx_channel_adc);
}
}
void *thread_rx_channel0(void *arg)
{
while (1)
{
axidma_oneway_transfer(axidma_dev,rx_chans->data[0],recv_buffer0,DMA1_SIZE,false);
start_receive_rx_channel0();
sem_wait(&sem_rx_channel0);
}
}
int main(int argc, char **argv)
{
int ret = 0;
pthread_t pth[2];
struct sigaction act;
axidma_dev = axidma_init();
if (axidma_dev == NULL) {
fprintf(stderr, "Error: Failed to initialize the AXI DMA device.\n");
exit(-1);
}
tx_chans = axidma_get_dma_tx(axidma_dev);
if (tx_chans->len < 1) {
fprintf(stderr, "Error: No transmit channels were found.\n");
free_memory(0);
return 1;
}
rx_chans = axidma_get_dma_rx(axidma_dev);
if (rx_chans->len < 2) {
fprintf(stderr, "Error: No enough channels were found.\n");
free_memory(0);
return 1;
}
printf("tx_chans->len = %d \n",tx_chans->len);
printf("rx_chans->len = %d \n",rx_chans->len);
send_buffer0 = (float *)axidma_malloc(axidma_dev, MMAP_SIZE);
if (send_buffer0 == NULL) {
fprintf(stderr, "Failed to allocate the send buffer.\n");
free_memory(0);
return 1;
}
recv_buffer0 = (float *)axidma_malloc(axidma_dev, MMAP_SIZE);
if (recv_buffer0 == NULL) {
fprintf(stderr, "Failed to allocate the receive buffer1.\n");
free_memory(2);
return 1;
}
recv_buffer_adc = (short *)axidma_malloc(axidma_dev, MMAP_SIZE);
if (recv_buffer_adc == NULL) {
fprintf(stderr, "Failed to allocate the receive buffer0.\n");
free_memory(1);
return 1;
}
sig_base = open("/dev/mem",O_RDWR | O_SYNC);
if(sig_base < 0)
{
printf("ERROR: can not open /dev/mem\n");
free_memory(3);
return 0;
}
sig_mem_base = (unsigned long *)mmap(NULL,MMAP_SIZE,PROT_READ|PROT_WRITE,MAP_SHARED,sig_base,SIG_REGBASE);
if(!sig_mem_base)
{
printf("ERROR: unable to mmap sig registers\n");
free_memory(4);
return 0;
}
init_hanning_win();
printf("send_buffer0 = 0x%x \n",send_buffer0);
printf("recv_buffer0 = 0x%x \n",recv_buffer0);
printf("recv_buffer_adc = 0x%x \n",recv_buffer_adc);
printf("sig_mem_base = 0x%x \n",sig_mem_base);
axidma_stop_transfer(axidma_dev,tx_chans->data[0]);
axidma_stop_transfer(axidma_dev,rx_chans->data[0]);
axidma_stop_transfer(axidma_dev,rx_chans->data[1]);
axidma_set_callback(axidma_dev,tx_chans->data[0],callback_tx_channel0,NULL);
axidma_set_callback(axidma_dev,rx_chans->data[0],callback_rx_channel0,NULL);
axidma_set_callback(axidma_dev,rx_chans->data[1],callback_rx_channel_adc,NULL);
printf("Callback funcs has been set.\n");
sem_init(&sem_rx_channel_adc, 0, 0);
sem_init(&sem_rx_channel0, 0, 0);
ret = pthread_create(&pth[0],NULL,thread_rx_channel_adc,NULL);
if (ret != 0)
{
free_memory(5);
printf("3.error!\n");
exit(-1);
}
act.sa_handler = sigint_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGINT, &act, 0);
printf("3.\n");
ret = pthread_create(&pth[1],NULL,thread_rx_channel0,NULL);
if (ret != 0)
{
free_memory(5);
printf("2.error!\n");
exit(-1);
}
printf("2.\n");
// 防止程序过早退出,在此等待
pthread_join(pth[0],NULL);
pthread_join(pth[1],NULL);
return 0;
}
使用了两个DMA共3个通道,则pl.dtsi应该为
/*
* CAUTION: This file is automatically generated by Xilinx.
* Version:
* Today is: Thu Nov 12 10:22:56 2020
*/
/ {
amba_pl: amba_pl {
#address-cells = <1>;
#size-cells = <1>;
compatible = "simple-bus";
ranges ;
/* 1.不可用下面的dma通道顺序来,否则代码编译出错
axidma_chrdev@1{
compatible = "xlnx,axidma-chrdev";
dmas = <&axi_dma_0 0 &axi_dma_1 0 &axi_dma_1 1>;
dma-names = "rx_channel_adc","tx_channel","rx_channel";
};
*/
axidma_chrdev@1{
compatible = "xlnx,axidma-chrdev";
dmas = <&axi_dma_1 0 &axi_dma_1 1 &axi_dma_0 0>;
dma-names = "tx_channel","rx_channel", "rx_channel_adc";
};
afe5401_collect_0: afe5401_collect@43c00000 {
compatible = "xlnx,afe5401-collect-1.0";
reg = <0x43c00000 0x10000>;
};
axi_dma_0: dma@40400000 {
#dma-cells = <1>;
clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk", "m_axi_s2mm_aclk";
clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>;
compatible = "xlnx,axi-dma-1.00.a";
interrupt-parent = <&intc>;
interrupts = <0 29 4>;
reg = <0x40400000 0x10000>;
xlnx,addrwidth = <0x20>;
// 2.不可用dma的sg模式,因为pl端没有用到这个模式
// xlnx,include-sg;
// xlnx,sg-length-width = <0x10>;
dma-channel@40400030 {
compatible = "xlnx,axi-dma-s2mm-channel";
dma-channels = <0x1>;
interrupts = <0 29 4>;
xlnx,datawidth = <0x10>;
xlnx,device-id = <0x2>;
};
};
axi_dma_1: dma@40410000 {
#dma-cells = <1>;
clock-names = "s_axi_lite_aclk", "m_axi_sg_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk";
clocks = <&clkc 15>, <&clkc 15>, <&clkc 15>, <&clkc 15>;
compatible = "xlnx,axi-dma-1.00.a";
interrupt-parent = <&intc>;
interrupts = <0 30 4 0 31 4>;
reg = <0x40410000 0x10000>;
xlnx,addrwidth = <0x20>;
// xlnx,include-sg;
// xlnx,sg-length-width = <0x10>;
dma-channel@40410000 {
compatible = "xlnx,axi-dma-mm2s-channel";
dma-channels = <0x1>;
interrupts = <0 30 4>;
xlnx,datawidth = <0x20>;
xlnx,device-id = <0x0>;
};
dma-channel@40410030 {
compatible = "xlnx,axi-dma-s2mm-channel";
dma-channels = <0x1>;
interrupts = <0 31 4>;
xlnx,datawidth = <0x20>;
xlnx,device-id = <0x1>;
};
};
};
};
6 尽管上面根据这种方式实现arm端和fpga端的数据交互,而且fpga端的数据过来之后就会触发该驱动下的arm端的中断,大大方便了用户层的开发工作,但是,这种方法还是有很大问题,关键在于----速度依然不够!!!
芯片手册上面介绍ps到pl端的S2MM或者MM2S,速率标称为250~275MB/s,相关具体的我忘了,是一个表来着。但是用这个来测试得到只有170MB/s,这个速度就差了一大街,不知道具体时间的消耗是在哪个地方。因此,这种方法有待商榷。