Vitis + ZCU104案例教程

最新推荐文章于 2024-04-09 09:39:01 发布

弱咩咩

最新推荐文章于 2024-04-09 09:39:01 发布

阅读量5.3k

点赞数 2

分类专栏：学习文章标签：嵌入式 linux fpga

本文链接：https://blog.csdn.net/qq_39229006/article/details/108821786

版权

学习专栏收录该内容

7 篇文章 3 订阅

订阅专栏

Vitis+ZCU104案例教程

参考：https://www.xilinx.com/html_docs/xilinx2020_1/vitis_doc/dvy1591145410207.html

开发环境：
操作系统：Ubuntu18.04

开发平台：Xilinx ZCU104

开发工具：Vitis 2020.1

镜像烧写工具：balenaEtcher

串口调试工具：minicom

1.准备工作

1.1安装OpenCL客户端驱动加载程序：

sudo apt-get install ocl-icd-libopencl1 opencl-headers ocl-icd-opencl-dev

1.2安装需要的软件包：

sudo add-apt-repository ppa:xorg-edgers/ppa 
sudo apt-get update
sudo apt-get install libgl1-mesa-glx
sudo apt-get install libgl1-mesa-dri
sudo apt-get install libgl1-mesa-dev
sudo add-apt-repository --remove ppa:xorg-edgers/ppa
sudo apt install net-tools
sudo apt-get install -y unzip
sudo apt install gcc
sudo apt install g++
sudo apt install python
ln -s /usr/bin/python2 /usr/bin/python

将文件解压并放到/opt/xilinx/platforms/目录下

1.6设置Vitis和XRT环境变量：

sudo gedit ./.bashrc

在文件末尾加上：

#setup XILINX_VITIS and XILINX_VIVADO variables
source <Vitis_install_path>/Vitis/2020.1/settings64.sh
#setup XILINX_XRT
source /opt/xilinx/xrt/setup.sh

1.7下载嵌入式平台常用镜像：

https://china.xilinx.com/support/download/index.html/content/xilinx/zh/downloadNav/embedded-platforms.html

下载文件：ZYNQMP通用映像（其他平台下载相应的镜像）

将文件解压到/opt/xilinx/目录下，解压后的文件名为xilinx-zynqmp-common-v2020.1，进入该文件，解压rootfs.ext4.gz文件

sudo gunzip ./rootfs.ext4.gz

解压后的文件名为：rootfs.ext4，接着为了得到平台所需要的sysroot文件，在当前文件目录下运行：

./sdk.sh -y -dir ./ -p

执行完成后得到文件夹ir。

至此，准备工作已经基本完成。

2.执行案例

命令行运行vitis，打开Vitis 2020.1，设置好Vitis工作空间后，点击File->new->Application Project…，在弹出界面点击Next，在Platform界面选择xilinx_zcu104_base_202010_1，点击Next，

输入项目名称：例如：test，点击Next，
在这里插入图片描述
在Application settings中设置Sysroot path、Root FS、Kernel Image：

Sysroot path：/opt/xilinx/xilinx-zynqmp-common-v2020.1/ir/sysroots/aarch64-xilinx-linux

Root FS：/opt/xilinx/xilinx-zynqmp-common-v2020.1/rootfs.ext4

Kernel Image：/opt/xilinx/xilinx-zynqmp-common-v2020.1/Image

点击Next，
在这里插入图片描述
选择Empty Application，点击Finish。

在test_system->test->src右键new->File，创建内核函数，输入文件名，如:kernel_test.cpp，需要注意内核函数名不能与关键词同名
在这里插入图片描述
kernel_test.cpp:

#define BUFFER_SIZE 256
extern "C" {

void kernel_test(
		 int* a,
		 int* b,
		 int* c,
         const int n_elements)
{

#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=a max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=b max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem1 port=c max_write_burst_length = 256

#pragma HLS INTERFACE s_axilite port=a  bundle=control
#pragma HLS INTERFACE s_axilite port=b  bundle=control
#pragma HLS INTERFACE s_axilite port=c  bundle=control

#pragma HLS INTERFACE s_axilite port=n_elements  bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control

	int arrayA[BUFFER_SIZE];
	int arrayB[BUFFER_SIZE];
	
	for (int i = 0 ; i < n_elements ; i += BUFFER_SIZE)
	{
	    int size = BUFFER_SIZE;
	    //boundary check
	    if (i + size > n_elements) size = n_elements - i;
	
	    //Burst reading A and B
	    readA: for (int j = 0 ; j < size ; j++) {
		#pragma HLS pipeline ii = 1 rewind
	        arrayA[j] = a[i+j];
	        arrayB[j] = b[i+j];
	    }
	
	    //Burst reading B and calculating C and Burst writing
	    // to  Global memory
	vadd_wrteC: for (int j = 0 ; j < size ; j++){
		#pragma HLS pipeline ii = 1 rewind
	        c[i+j] = arrayA[j] + arrayB[j];
	    }
	}

}
}

该内核函数简单实现了一个c=a+b的功能，运用了HLS的一些语法和优化。

然后再用一样的方法创建主机程序：host.cpp与host.h

host.cpp:

#include <stdlib.h>
#include <fstream>
#include <iostream>
#include "host.h"

static const int DATA_SIZE = 4096;

static const std::string error_message =
    "Error: Result mismatch:\n"
    "i = %d CPU result = %d Device result = %d\n";

int main(int argc, char* argv[]) {

    //TARGET_DEVICE macro needs to be passed from gcc command line
    if(argc != 2) {
    	std::cout << "Usage: " << argv[0] <<" <xclbin>" << std::endl;
    	return EXIT_FAILURE;
    }
    
    char* xclbinFilename = argv[1];
    
    // Compute the size of array in bytes
    size_t size_in_bytes = DATA_SIZE * sizeof(int);
    
    // Creates a vector of DATA_SIZE elements with an initial value of 10 and 32
    // using customized allocator for getting buffer alignment to 4k boundary
    
    std::vector<cl::Device> devices;
    cl::Device device;
    std::vector<cl::Platform> platforms;
    bool found_device = false;
    
    //traversing all Platforms To find Xilinx Platform and targeted
    //Device in Xilinx Platform
    cl::Platform::get(&platforms);
    for(size_t i = 0; (i < platforms.size() ) & (found_device == false) ;i++){
        cl::Platform platform = platforms[i];
        std::string platformName = platform.getInfo<CL_PLATFORM_NAME>();
        if ( platformName == "Xilinx"){
            devices.clear();
            platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
        if (devices.size()){
    	    device = devices[0];
    	    found_device = true;
    	    break;
        }
        }
    }
    if (found_device == false){
       std::cout << "Error: Unable to find Target Device "
           << device.getInfo<CL_DEVICE_NAME>() << std::endl;
       return EXIT_FAILURE;
    }
    
    // Creating Context and Command Queue for selected device
    cl::Context context(device);
    cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE);
    
    // Load xclbin
    std::cout << "Loading: '" << xclbinFilename << "'\n";
    std::ifstream bin_file(xclbinFilename, std::ifstream::binary);
    bin_file.seekg (0, bin_file.end);
    unsigned nb = bin_file.tellg();
    bin_file.seekg (0, bin_file.beg);
    char *buf = new char [nb];
    bin_file.read(buf, nb);
    
    // Creating Program from Binary File
    cl::Program::Binaries bins;
    bins.push_back({buf,nb});
    devices.resize(1);
    cl::Program program(context, devices, bins);
    
    // This call will get the kernel object from program. A kernel is an
    // OpenCL function that is executed on the FPGA.
    cl::Kernel krnl_vector_add(program,"kernel_test");
    
    // These commands will allocate memory on the Device. The cl::Buffer objects can
    // be used to reference the memory locations on the device.
    cl::Buffer buffer_a(context, CL_MEM_READ_ONLY, size_in_bytes);
    cl::Buffer buffer_b(context, CL_MEM_READ_ONLY, size_in_bytes);
    cl::Buffer buffer_result(context, CL_MEM_WRITE_ONLY, size_in_bytes);
    
    //set the kernel Arguments
    int narg=0;
    krnl_vector_add.setArg(narg++,buffer_a);
    krnl_vector_add.setArg(narg++,buffer_b);
    krnl_vector_add.setArg(narg++,buffer_result);
    krnl_vector_add.setArg(narg++,DATA_SIZE);
    
    //We then need to map our OpenCL buffers to get the pointers
    int *ptr_a = (int *) q.enqueueMapBuffer (buffer_a , CL_TRUE , CL_MAP_WRITE , 0, size_in_bytes);
    int *ptr_b = (int *) q.enqueueMapBuffer (buffer_b , CL_TRUE , CL_MAP_WRITE , 0, size_in_bytes);
    int *ptr_result = (int *) q.enqueueMapBuffer (buffer_result , CL_TRUE , CL_MAP_READ , 0, size_in_bytes);
    
    //setting input data
    for(int i = 0 ; i< DATA_SIZE; i++){
        ptr_a[i] = 10;
        ptr_b[i] = 20;
    }
    
    // Data will be migrated to kernel space
    q.enqueueMigrateMemObjects({buffer_a,buffer_b},0/* 0 means from host*/);
    
    //Launch the Kernel
    q.enqueueTask(krnl_vector_add);
    
    // The result of the previous kernel execution will need to be retrieved in
    // order to view the results. This call will transfer the data from FPGA to
    // source_results vector
    q.enqueueMigrateMemObjects({buffer_result},CL_MIGRATE_MEM_OBJECT_HOST);
    
    q.finish();
    
    //Verify the result
    int match = 0;
    for (int i = 0; i < DATA_SIZE; i++) {
        int host_result = ptr_a[i] + ptr_b[i];
        if (ptr_result[i] != host_result) {
            printf(error_message.c_str(), i, host_result, ptr_result[i]);
            match = 1;
            break;
        }
    }
    
    q.enqueueUnmapMemObject(buffer_a , ptr_a);
    q.enqueueUnmapMemObject(buffer_b , ptr_b);
    q.enqueueUnmapMemObject(buffer_result , ptr_result);
    q.finish();
    
    std::cout << "TEST " << (match ? "FAILED" : "PASSED") << std::endl;
    return (match ? EXIT_FAILURE :  EXIT_SUCCESS);

}

host.h:

#pragma once

#define CL_HPP_CL_1_2_DEFAULT_BUILD
#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_MINIMUM_OPENCL_VERSION 120
#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1

#include <CL/cl2.hpp>

//Customized buffer allocation for 4K boundary alignment
template <typename T>
struct aligned_allocator
{
  using value_type = T;
  T* allocate(std::size_t num)
  {
    void* ptr = nullptr;
    if (posix_memalign(&ptr,4096,num*sizeof(T)))
      throw std::bad_alloc();
    return reinterpret_cast<T*>(ptr);
  }
  void deallocate(T* p, std::size_t num)
  {
    free(p);
  }
};

该主机函数负责将内核函数传送到设备上执行，并验证结果是否正确。

在Application Project Settings界面，添加Hardware Funcation，点击在这里插入图片描述
添加二进制容器，点击
，添加kernel_test函数，并设置Active build configuration:Hardware

在Assistant界面右键Hardware->Build进行编译，编译成功后在工程目录下生成如下文件：

其中sd_card.img用来烧写进SD卡的镜像

运用镜像烧写工具将sd_card.img烧写进SD卡，插上SD卡启动ZCU104,通过串口连接将显示以下信息：
在这里插入图片描述
执行以下命令：

cd /mnt/sd-mncblk0p1
source ./init.sh

运行程序：

./test binary_container_1.xclbin

显示如下：
在这里插入图片描述
显示TEST PASSED，成功

弱咩咩

关注

2
点赞
踩
17

收藏

觉得还不错? 一键收藏
3
评论
Vitis + ZCU104案例教程

Vitis+ZCU104案例教程参考：https://www.xilinx.com/html_docs/xilinx2020_1/vitis_doc/dvy1591145410207.html开发环境：操作系统：Ubuntu18.04开发平台：Xilinx ZCU104开发工具：Vitis 2020.1镜像烧写工具：balenaEtcher串口调试工具：minicom1.准备工作1.1安装OpenCL客户端驱动加载程序：sudo apt-get install ocl-icd-libop
复制链接

扫一扫