Vitis + ZCU104案例教程

Vitis+ZCU104案例教程

参考:https://www.xilinx.com/html_docs/xilinx2020_1/vitis_doc/dvy1591145410207.html

开发环境:
操作系统:Ubuntu18.04

开发平台:Xilinx ZCU104

开发工具:Vitis 2020.1

镜像烧写工具:balenaEtcher

串口调试工具:minicom

1.准备工作

1.1安装OpenCL客户端驱动加载程序:
sudo apt-get install ocl-icd-libopencl1 opencl-headers ocl-icd-opencl-dev
1.2安装需要的软件包:
sudo add-apt-repository ppa:xorg-edgers/ppa 
sudo apt-get update
sudo apt-get install libgl1-mesa-glx
sudo apt-get install libgl1-mesa-dri
sudo apt-get install libgl1-mesa-dev
sudo add-apt-repository --remove ppa:xorg-edgers/ppa
sudo apt install net-tools
sudo apt-get install -y unzip
sudo apt install gcc
sudo apt install g++
sudo apt install python
ln -s /usr/bin/python2 /usr/bin/python
1.3安装Vitis软件平台:

https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vitis.html

1.4安装Xilinx运行时:

https://china.xilinx.com/support/download/index.html/content/xilinx/zh/downloadNav/embedded-platforms.html

1.5安装目标平台文件(ZCU104 Base 2020.1 ):

https://china.xilinx.com/support/download/index.html/content/xilinx/zh/downloadNav/embedded-platforms.html

将文件解压并放到/opt/xilinx/platforms/目录下

1.6设置Vitis和XRT环境变量:
sudo gedit ./.bashrc

在文件末尾加上:

#setup XILINX_VITIS and XILINX_VIVADO variables
source <Vitis_install_path>/Vitis/2020.1/settings64.sh
#setup XILINX_XRT
source /opt/xilinx/xrt/setup.sh
1.7下载嵌入式平台常用镜像:

https://china.xilinx.com/support/download/index.html/content/xilinx/zh/downloadNav/embedded-platforms.html

下载文件:ZYNQMP通用映像(其他平台下载相应的镜像)

将文件解压到/opt/xilinx/目录下,解压后的文件名为xilinx-zynqmp-common-v2020.1,进入该文件,解压rootfs.ext4.gz文件

sudo gunzip ./rootfs.ext4.gz

解压后的文件名为:rootfs.ext4,接着为了得到平台所需要的sysroot文件,在当前文件目录下运行:

./sdk.sh -y -dir ./ -p

执行完成后得到文件夹ir

至此,准备工作已经基本完成。

2.执行案例

命令行运行vitis,打开Vitis 2020.1,设置好Vitis工作空间后,点击File->new->Application Project…,在弹出界面点击Next,在Platform界面选择xilinx_zcu104_base_202010_1,点击Next


输入项目名称:例如:test,点击Next
在这里插入图片描述
Application settings中设置Sysroot path、Root FS、Kernel Image:

Sysroot path:/opt/xilinx/xilinx-zynqmp-common-v2020.1/ir/sysroots/aarch64-xilinx-linux

Root FS:/opt/xilinx/xilinx-zynqmp-common-v2020.1/rootfs.ext4

Kernel Image:/opt/xilinx/xilinx-zynqmp-common-v2020.1/Image

点击Next
在这里插入图片描述
选择Empty Application,点击Finish

test_system->test->src右键new->File,创建内核函数,输入文件名,如:kernel_test.cpp,需要注意内核函数名不能与关键词同名
在这里插入图片描述
kernel_test.cpp:

#define BUFFER_SIZE 256
extern "C" {

void kernel_test(
		 int* a,
		 int* b,
		 int* c,
         const int n_elements)
{

#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=a max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=b max_read_burst_length = 256
#pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem1 port=c max_write_burst_length = 256

#pragma HLS INTERFACE s_axilite port=a  bundle=control
#pragma HLS INTERFACE s_axilite port=b  bundle=control
#pragma HLS INTERFACE s_axilite port=c  bundle=control

#pragma HLS INTERFACE s_axilite port=n_elements  bundle=control
#pragma HLS INTERFACE s_axilite port=return bundle=control

	int arrayA[BUFFER_SIZE];
	int arrayB[BUFFER_SIZE];
	
	for (int i = 0 ; i < n_elements ; i += BUFFER_SIZE)
	{
	    int size = BUFFER_SIZE;
	    //boundary check
	    if (i + size > n_elements) size = n_elements - i;
	
	    //Burst reading A and B
	    readA: for (int j = 0 ; j < size ; j++) {
		#pragma HLS pipeline ii = 1 rewind
	        arrayA[j] = a[i+j];
	        arrayB[j] = b[i+j];
	    }
	
	    //Burst reading B and calculating C and Burst writing
	    // to  Global memory
	vadd_wrteC: for (int j = 0 ; j < size ; j++){
		#pragma HLS pipeline ii = 1 rewind
	        c[i+j] = arrayA[j] + arrayB[j];
	    }
	}

}
}

该内核函数简单实现了一个c=a+b的功能,运用了HLS的一些语法和优化。

然后再用一样的方法创建主机程序:host.cpp与host.h

host.cpp:

#include <stdlib.h>
#include <fstream>
#include <iostream>
#include "host.h"

static const int DATA_SIZE = 4096;

static const std::string error_message =
    "Error: Result mismatch:\n"
    "i = %d CPU result = %d Device result = %d\n";

int main(int argc, char* argv[]) {

    //TARGET_DEVICE macro needs to be passed from gcc command line
    if(argc != 2) {
    	std::cout << "Usage: " << argv[0] <<" <xclbin>" << std::endl;
    	return EXIT_FAILURE;
    }
    
    char* xclbinFilename = argv[1];
    
    // Compute the size of array in bytes
    size_t size_in_bytes = DATA_SIZE * sizeof(int);
    
    // Creates a vector of DATA_SIZE elements with an initial value of 10 and 32
    // using customized allocator for getting buffer alignment to 4k boundary
    
    std::vector<cl::Device> devices;
    cl::Device device;
    std::vector<cl::Platform> platforms;
    bool found_device = false;
    
    //traversing all Platforms To find Xilinx Platform and targeted
    //Device in Xilinx Platform
    cl::Platform::get(&platforms);
    for(size_t i = 0; (i < platforms.size() ) & (found_device == false) ;i++){
        cl::Platform platform = platforms[i];
        std::string platformName = platform.getInfo<CL_PLATFORM_NAME>();
        if ( platformName == "Xilinx"){
            devices.clear();
            platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
        if (devices.size()){
    	    device = devices[0];
    	    found_device = true;
    	    break;
        }
        }
    }
    if (found_device == false){
       std::cout << "Error: Unable to find Target Device "
           << device.getInfo<CL_DEVICE_NAME>() << std::endl;
       return EXIT_FAILURE;
    }
    
    // Creating Context and Command Queue for selected device
    cl::Context context(device);
    cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE);
    
    // Load xclbin
    std::cout << "Loading: '" << xclbinFilename << "'\n";
    std::ifstream bin_file(xclbinFilename, std::ifstream::binary);
    bin_file.seekg (0, bin_file.end);
    unsigned nb = bin_file.tellg();
    bin_file.seekg (0, bin_file.beg);
    char *buf = new char [nb];
    bin_file.read(buf, nb);
    
    // Creating Program from Binary File
    cl::Program::Binaries bins;
    bins.push_back({buf,nb});
    devices.resize(1);
    cl::Program program(context, devices, bins);
    
    // This call will get the kernel object from program. A kernel is an
    // OpenCL function that is executed on the FPGA.
    cl::Kernel krnl_vector_add(program,"kernel_test");
    
    // These commands will allocate memory on the Device. The cl::Buffer objects can
    // be used to reference the memory locations on the device.
    cl::Buffer buffer_a(context, CL_MEM_READ_ONLY, size_in_bytes);
    cl::Buffer buffer_b(context, CL_MEM_READ_ONLY, size_in_bytes);
    cl::Buffer buffer_result(context, CL_MEM_WRITE_ONLY, size_in_bytes);
    
    //set the kernel Arguments
    int narg=0;
    krnl_vector_add.setArg(narg++,buffer_a);
    krnl_vector_add.setArg(narg++,buffer_b);
    krnl_vector_add.setArg(narg++,buffer_result);
    krnl_vector_add.setArg(narg++,DATA_SIZE);
    
    //We then need to map our OpenCL buffers to get the pointers
    int *ptr_a = (int *) q.enqueueMapBuffer (buffer_a , CL_TRUE , CL_MAP_WRITE , 0, size_in_bytes);
    int *ptr_b = (int *) q.enqueueMapBuffer (buffer_b , CL_TRUE , CL_MAP_WRITE , 0, size_in_bytes);
    int *ptr_result = (int *) q.enqueueMapBuffer (buffer_result , CL_TRUE , CL_MAP_READ , 0, size_in_bytes);
    
    //setting input data
    for(int i = 0 ; i< DATA_SIZE; i++){
        ptr_a[i] = 10;
        ptr_b[i] = 20;
    }
    
    // Data will be migrated to kernel space
    q.enqueueMigrateMemObjects({buffer_a,buffer_b},0/* 0 means from host*/);
    
    //Launch the Kernel
    q.enqueueTask(krnl_vector_add);
    
    // The result of the previous kernel execution will need to be retrieved in
    // order to view the results. This call will transfer the data from FPGA to
    // source_results vector
    q.enqueueMigrateMemObjects({buffer_result},CL_MIGRATE_MEM_OBJECT_HOST);
    
    q.finish();
    
    //Verify the result
    int match = 0;
    for (int i = 0; i < DATA_SIZE; i++) {
        int host_result = ptr_a[i] + ptr_b[i];
        if (ptr_result[i] != host_result) {
            printf(error_message.c_str(), i, host_result, ptr_result[i]);
            match = 1;
            break;
        }
    }
    
    q.enqueueUnmapMemObject(buffer_a , ptr_a);
    q.enqueueUnmapMemObject(buffer_b , ptr_b);
    q.enqueueUnmapMemObject(buffer_result , ptr_result);
    q.finish();
    
    std::cout << "TEST " << (match ? "FAILED" : "PASSED") << std::endl;
    return (match ? EXIT_FAILURE :  EXIT_SUCCESS);

}

host.h:

#pragma once

#define CL_HPP_CL_1_2_DEFAULT_BUILD
#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_MINIMUM_OPENCL_VERSION 120
#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1

#include <CL/cl2.hpp>

//Customized buffer allocation for 4K boundary alignment
template <typename T>
struct aligned_allocator
{
  using value_type = T;
  T* allocate(std::size_t num)
  {
    void* ptr = nullptr;
    if (posix_memalign(&ptr,4096,num*sizeof(T)))
      throw std::bad_alloc();
    return reinterpret_cast<T*>(ptr);
  }
  void deallocate(T* p, std::size_t num)
  {
    free(p);
  }
};

该主机函数负责将内核函数传送到设备上执行,并验证结果是否正确。

Application Project Settings界面,添加Hardware Funcation,点击在这里插入图片描述
添加二进制容器,点击
在这里插入图片描述,添加kernel_test函数,并设置Active build configuration:Hardware
在这里插入图片描述
在Assistant界面右键Hardware->Build进行编译,编译成功后在工程目录下生成如下文件:
在这里插入图片描述
其中sd_card.img用来烧写进SD卡的镜像

运用镜像烧写工具将sd_card.img烧写进SD卡,插上SD卡启动ZCU104,通过串口连接将显示以下信息:
在这里插入图片描述
执行以下命令:

cd /mnt/sd-mncblk0p1
source ./init.sh

运行程序:

./test binary_container_1.xclbin

显示如下:
在这里插入图片描述
显示TEST PASSED,成功

  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值