Pipeline基于FPGA的CNN加速项目说明

PipeCNN

About

PipeCNN is an OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (CNNs).
There is a growing trend among the FPGA community to utilize High Level Synthesis (HLS) tools to design
and implement customized circuits on FPGAs. Compared with RTL-based design methodology, the HLS tools provide faster hardware development
cycle by automatically synthesizing an algorithm in high-level languages (e.g. C/C++) to RTL/hardware. OpenCL™ is an open, emergying cross-platform parallel programming language that can be used in both GPU and FPGA developments. The main goal of this project is to provide a generic, yet efficient OpenCL-based design of CNN accelerator on FPGAs. Our design is scalable both in performance and hardware resource, and thus can be deployed on a variety of FPGA platforms.

How to Use

First, download the pre-trained CNN models, input test vectors and golden reference files from PipeCNN’s own ModelZoo. Place the data in the correct folder. Compile the project by using the Makefile provided. After finishing the compilation, simply type the following command to run PipeCNN:

./run.exe conv.aocx

For users who are using Xilinx’s SDx environments, it is recommended to use the IDE instead of makefiles. Currently, only Intel’s OpenCL SDK v16.1 and Xilinx’s SDAccel v2017.2 are supported. Please carefully read the User Instructions before using.

Boards and Performances

Currently, we use Intel’s OpenCL SDK v16.1 toolset for compilation of the OpenCL code and implementation of the generated RTL on Altera’s FPGAs. For Xilinx FPGAs, the SDAccel development environment v2017.2 can be used. PipeCNN has been tested and evaluated on the following FPGA boards/platforms. Noting that SDSoC has not been fully tested, and if you have any results, please kindly email us the latest updates.

The following boards have been tested by using Intel OpenCL SDK v16.1:

The following boards have been tested by using Xilinx SDAccel v2017.2:

This following table lists the performance and cost information on some of the boards we used as a reference. For each FPGA device, one needs to perform design space exploration (with hardware parameters VEC_SIZE, LANE_NUM and CONV_GP_SIZE_X) to find the optimal design that maximizes the throughput or minimizes the excution time. Suggested hardware parameters for the above boards are summarized here. Since we are constantly optimzing the design and updating the codes, the performance data in the following table might be out-dated, and please use the latest version to get the exect data. We welcome other vendors/researches to provide the latest performance and cost information on other FPGA platforms/boards.

BoardsExcution Time*Batch SizeDSP ConsumedFrequency
DE1-soc150ms168122MHz
DE5-net15ms16228206MHz

*Note: AlexNet was used as the benchmark. Image size is 227x227x3.

Demos

Now you can run ImageNet classification on PipeCNN, and measure the top-1/5 accuracy on your own dataset.

First, set USE_OPENCV = 1 in the Makefile. Secondly, download the ImageNet validation dataset, extract and place all the pictures in the “/data” folder. Rename the variable “picture_file_path_head” in the host file to indicate the correct image data set path. Finally, recompile the host program and run PipeCNN.

The following piture shows that the demo runs on our own computer with the DE5-net board.

DE5-net-Demo

Update Plans

  • Support for sparse or Winograd-based convolution algorithms.
  • Implementation of Faster-RCNN and YOLO9000.

Citation

Please kindly cite our work of PipeCNN if it helps your research:

Dong Wang, Ke Xu and Diankun Jiang, “PipeCNN: An OpenCL-Based Open-Source FPGA Accelerator for Convolution Neural Networks”, FPT 2017.

Related Works

There are other FPGA accelerators that also adopt HLS-based design scheme. Some brilliant works are listed as follow. Note that PipeCNN is the first, and only one that is Open-Source ( ̄︶ ̄)↗

  • U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. R. Chiu. “An OpenCL™ Deep Learning Accelerator on Arria 10,” in Proc. FPGA 2017.
  • N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. F. Ma, S. Vrudhula, J. S. Seo, and Y. Cao, “Throughput-Optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks,” in Proc. FPGA 2016.
  • C. Zhang, P. Li, G. Sun, Y. Guan, B. J. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. FPGA 2015.
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
FPGA(Field-Programmable Gate Array)管线是指在FPGA设计中使用流水线技术来提高系统性能和吞吐量的一种方法。流水线是一种将复杂的计算任务分解成多个较简单的子任务,并通过将子任务连接起来,使其能够同时执行的技术。 在FPGA设计中,管线可以被用于增加逻辑电路的利用率,提高系统的时钟频率和吞吐量。通过将计算任务分解为多个阶段,并将这些阶段在连续的时钟周期内并行执行,可以减少计算任务的延迟和资源的占用率。每个阶段都是通过数据流顺序依次传递下去的,使得计算能够更高效地执行。 FPGA管线的设计需要考虑多个因素。首先是对于复杂计算任务的分解,需要根据任务的特点和计算流程来确定合适的阶段划分和任务划分。其次是管线的控制逻辑,需要保证各个阶段的同步和协调,确保数据能够正确地流经整个管线。此外,还需要考虑管线的流水线寄存器和时序约束,以确保在高频率下管线能够正常工作。 使用FPGA管线可以显著提高系统的性能和吞吐量。通过合理的任务划分和管线设计,可以实现高效的计算和数据处理。此外,FPGA的可编程性使得管线设计更加灵活和可优化,可以根据具体应用需求进行优化和调整。 总之,FPGA管线是一种通过流水线技术来提高系统性能和吞吐量的方法。通过合理的任务划分和管线设计,可以实现高效的计算和数据处理,从而满足不同应用领域对于性能和吞吐量的要求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值