背景:ZynqNet能在xilinx的FPGA上实现deep compression
目的:运行zynqNet的代码。
源码地址:https://github.com/dgschwend/zynqnet
目录
项目程序架构,针对我们的项目,我们需要看懂相应的HLS程序和ARM端的程序。
ARM端的程序以_FIRMWARE为准;FPGA端程序以HLS为准。
1. _TRAINED_MODEL
这部分为训练好的caffe模型与预训练的权重。
2. _FIRMWARE
这部分程序针对Zynq 7Z035 ARM processor。make之后是在服务器上运行的,一次迭代需要将近3590ms
make
./test CPU|FPGA indata.bin (-quiet)
2.1 运行打印结果
gpu@gpu-SYS-7048GR-TR:~/datasets/xxr/zynqnet/_FIRMWARE$ ./test CPU indata.bin
______ _ _ _
|___ / | \ | | | |
/ / _ _ _ __ __ _| \| | ___| |_
/ / | | | | '_ \ / _` | . ` |/ _ \ __|
./ /__| |_| | | | | (_| | |\ | __/ |_
\_____/\__, |_| |_|\__, \_| \_/\___|\__|
__/ | | |
|___/ |_| (c) 2016 davidgs
CPU: Load Network Configuration
c1 : 256x256 x 3 > 64 , CONV (3x3)/2p + ReLU, IN @mem( 0- 786432B), OUT @mem( 786432B), WEIGHTS @mem( 0- 7168B)
f2/s3 : 128x128 x 64 > 16 , CONV (3x3)/2p + ReLU, IN @mem( 786432- 4980736B), OUT @mem( 4980736B), WEIGHTS @mem( 7168- 44096B)
f2/e1 : 64x64 x 16 > 64 , CONV (1x1)/1 + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5242880B), WEIGHTS @mem( 44096- 48448B) (split1)
f2/e3 : 64x64 x 16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5243136B), WEIGHTS @mem( 48448- 85568B) (split2)
f3/s1 : 64x64 x 128 > 16 , CONV (1x1)/1 + ReLU, IN @mem( 5242880- 7340032B), OUT @mem( 7340032B), WEIGHTS @mem( 85568- 93824B)
f3/e1 : 64x64 x 16 > 64 , CONV (1x1)/1 + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602176B), WEIGHTS @mem( 93824- 98176B) (split1)
f3/e3 : 64x64 x 16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602432B), WEIGHTS @mem( 98176- 135296B) (split2)
f4/s3 : 64x64 x 128 > 32 , CONV (3x3)/2p + ReLU, IN @mem( 7602176- 9699328B), OUT @mem( 9699328B), WEIGHTS @mem( 135296- 282880B)
f4/e1 : 32x32 x 32 > 128, CONV (1x1)/1 + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830400B), WEIGHTS @mem( 282880- 299776B) (split1)
f4/e3 : 32x32 x 32 > 128, CONV (3x3)/1p + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830912B), WEIGHTS @mem( 299776- 447744B) (split2)
f5/s1 : 32x32 x 256 > 32 , CONV (1x1)/1 + ReLU, IN @mem( 9830400-10878976B), OUT @mem(10878976B), WEIGHTS @mem( 447744- 480640B)
f5/e1 : 32x32 x 32 > 128, CONV (1x1)/1 + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010048B), WEIGHTS @mem( 480640- 497536B) (split1)
f5/e3 : 32x32 x 32 > 128, CONV (3x3)/1p + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010560B), WEIGHTS @mem( 497536- 645504B) (split2)
f6/s3 : 32x32 x 256 > 64 , CONV (3x3)/2p + ReLU, IN @mem(11010048-12058624B), OUT @mem(12058624B), WEIGHTS @mem( 645504- 1235584B)
f6/e1 : 16x16 x 64 > 256, CONV (1x1)/1 + ReLU, IN @mem(12058624-12124160B), OUT @mem(12124160B), WEIGHTS @mem( 1235584- 1302144B) (split1)
f6/e3 : 16x16 x 64 > 256, CONV (3x3)/1p + ReLU, IN @mem(12058624-12124160B), OUT @mem(12125184B), WEIGHTS @mem( 1302144- 1892992B) (split2)
f7/s1 : 16x16 x 512 > 64 , CONV (1x1)/1 + ReLU, IN @mem(12124160-12648448B), OUT @mem(12648448B), WEIGHTS @mem( 1892992- 2024320B)
f7/e1 : 16x16 x 64 > 192, CONV (1x1)/1 + ReLU, IN @mem(12648448-12713984B), OUT @mem(12713984B), WEIGHTS @mem( 2024320- 2074240B) (split1)
f7/e3 : 16x16 x 64 > 192, CONV (3x3)/1p + ReLU, IN @mem(12648448-12713984B), OUT @mem(12714752B), WEIGHTS @mem( 2074240- 2517376B) (split2)
f8/s3 : 16x16 x 384 > 112, CONV (3x3)/2p + ReLU, IN @mem(12713984-13107200B), OUT @mem(13107200B), WEIGHTS @mem( 2517376- 4066112B)
f8/e1 : 8x8 x 112 > 256, CONV (1x1)/1 + ReLU, IN @mem(13107200-13135872B), OUT @mem(13135872B), WEIGHTS @mem( 4066112- 4181824B) (split1)
f8/e3 : 8x8 x 112 > 256, CONV (3x3)/1p + ReLU, IN @mem(13107200-13135872B), OUT @mem(13136896B), WEIGHTS @mem( 4181824- 5215040B) (split2)
f9/s1 : 8x8 x 512 > 112, CONV (1x1)/1 + ReLU, IN @mem(13135872-13266944B), OUT @mem(13266944B), WEIGHTS @mem( 5215040- 5444864B)
f9/e1 : 8x8 x 112 > 368, CONV (1x1)/1 + ReLU, IN @mem(13266944-13295616B), OUT @mem(13295616B), WEIGHTS @mem( 5444864- 5611200B) (split1)
f9/e3 : 8x8 x 112 > 368, CONV (3x3)/1p + ReLU, IN @mem(13266944-13295616B), OUT @mem(13297088B), WEIGHTS @mem( 5611200- 7096448B) (split2)
c10/p1: 8x8 x 736 > 512, CONV (1x1)/1 , IN @mem(13295616-13484032B), OUT @mem(13484032B), WEIGHTS @mem( 7096448- 8605824B) (split1) GLOBAL POOL
c10/p2: 8x8 x 736 > 512, CONV (1x1)/1 , IN @mem(13295616-13484032B), OUT @mem(13486080B), WEIGHTS @mem( 8605824-10115200B) (split2) GLOBAL POOL
CPU: FPGA DRAM Memory Allocation:
Bytes allocated: 0B (config) + 9878KB (weights) + 13296KB (data)
region: 140609957294096 ▒?140609981024400
CPU: Copy Weights: 9878KB (weights)
CPU: Load Input Data from file indata.bin (768KB)
CPU: Copy Input Image (768KB)
## Iteration 0000 ##
CPU: Offload CONV Layer c1 : 256x256 x 3 > 64 , CONV (3x3)/2p + ReLU, IN @mem( 0- 786432B), OUT @mem( 786432B), WEIGHTS @mem( 0- 7168B)
FPGA: Computing .........................................................................................
.........................................................................................
.............................................................................. done.
run time: 118ms
CPU: Offload CONV Layer f2/s3 : 128x128 x 64 > 16 , CONV (3x3)/2p + ReLU, IN @mem( 786432- 4980736B), OUT @mem( 4980736B), WEIGHTS @mem( 7168- 44096B)
FPGA: Computing ................................................................................................................................ done.
run time: 146ms
CPU: Offload CONV Layer f2/e1 : 64x64 x 16 > 64 , CONV (1x1)/1 + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5242880B), WEIGHTS @mem( 44096- 48448B) (split1)
FPGA: Computing ................................................................ done.
run time: 95ms
CPU: Offload CONV Layer f2/e3 : 64x64 x 16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5243136B), WEIGHTS @mem( 48448- 85568B) (split2)
FPGA: Computing ................................................................ done.
run time: 102ms
CPU: Offload CONV Layer f3/s1 : 64x64 x 128 > 16 , CONV (1x1)/1 + ReLU, IN @mem( 5242880- 7340032B), OUT @mem( 7340032B), WEIGHTS @mem( 85568- 93824B)
FPGA: Computing ................................................................ done.
run time: 247ms
CPU: Offload CONV Layer f3/e1 : 64x64 x 16 > 64 , CONV (1x1)/1 + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602176B), WEIGHTS @mem( 93824- 98176B) (split1)
FPGA: Computing ................................................................ done.
run time: 113ms
CPU: Offload CONV Layer f3/e3 : 64x64 x 16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602432B), WEIGHTS @mem( 98176- 135296B) (split2)
FPGA: Computing ................................................................ done.
run time: 102ms
CPU: Offload CONV Layer f4/s3 : 64x64 x 128 > 32 , CONV (3x3)/2p + ReLU, IN @mem( 7602176- 9699328B), OUT @mem( 9699328B), WEIGHTS @mem( 135296- 282880B)
FPGA: Computing ................................................................ done.
run time: 106ms
CPU: Offload CONV Layer f4/e1 : 32x32 x 32 > 128, CONV (1x1)/1 + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830400B), WEIGHTS @mem( 282880- 299776B) (split1)
FPGA: Computing ................................ done.
run time: 90ms
CPU: Offload CONV Layer f4/e3 : 32x32 x 32 > 128, CONV (3x3)/1p + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830912B), WEIGHTS @mem( 299776- 447744B) (split2)
FPGA: Computing ................................ done.
run time: 98ms
CPU: Offload CONV Layer f5/s1 : 32x32 x 256 > 32 , CONV (1x1)/1 + ReLU, IN @mem( 9830400-10878976B), OUT @mem(10878976B), WEIGHTS @mem( 447744- 480640B)
FPGA: Computing ................................ done.
run time: 191ms
CPU: Offload CONV Layer f5/e1 : 32x32 x 32 > 128, CONV (1x1)/1 + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010048B), WEIGHTS @mem( 480640- 497536B) (split1)
FPGA: Computing ................................ done.
run time: 90ms
CPU: Offload CONV Layer f5/e3 : 32x32 x 32 > 128, CONV (3x3)/1p + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010560B), WEIGHTS @mem( 497536- 645504B) (split2)
FPGA: Computing ................................ done.
run time: 98ms
CPU: Offload CONV Layer f6/s3 : 32x32 x 256 > 64 , CONV (3x3)/2p + ReLU, IN @mem(11010048-12058624B), OUT @mem(12058624B), WEIGHTS @mem( 645504- 1235584B)
FPGA: Computing ................................ done.
run time: 106ms
CPU: Offload CONV Layer f6/e1 : 16x16 x 64 > 256, CONV (1x1)/1 + ReLU, IN @mem(12058624-12124160B), OUT @mem(12124160B), WEIGHTS @mem( 1235584- 1302144B) (split1)
FPGA: Computing ................ done.
run time: 94ms
CPU: Offload CONV Layer f6/e3 : 16x16 x 64 > 256, CONV (3x3)/1p + ReLU, IN @mem(12058624-12124160B), OUT @mem(12125184B), WEIGHTS @mem( 1302144- 1892992B) (split2)
FPGA: Computing ................ done.
run time: 98ms
CPU: Offload CONV Layer f7/s1 : 16x16 x 512 > 64 , CONV (1x1)/1 + ReLU, IN @mem(12124160-12648448B), OUT @mem(12648448B), WEIGHTS @mem( 1892992- 2024320B)
FPGA: Computing ................ done.
run time: 181ms
CPU: Offload CONV Layer f7/e1 : 16x16 x 64 > 192, CONV (1x1)/1 + ReLU, IN @mem(12648448-12713984B), OUT @mem(12713984B), WEIGHTS @mem( 2024320- 2074240B) (split1)
FPGA: Computing ................ done.
run time: 66ms
CPU: Offload CONV Layer f7/e3 : 16x16 x 64 > 192, CONV (3x3)/1p + ReLU, IN @mem(12648448-12713984B), OUT @mem(12714752B), WEIGHTS @mem( 2074240- 2517376B) (split2)
FPGA: Computing ................ done.
run time: 73ms
CPU: Offload CONV Layer f8/s3 : 16x16 x 384 > 112, CONV (3x3)/2p + ReLU, IN @mem(12713984-13107200B), OUT @mem(13107200B), WEIGHTS @mem( 2517376- 4066112B)
FPGA: Computing ................ done.
run time: 67ms
CPU: Offload CONV Layer f8/e1 : 8x8 x 112 > 256, CONV (1x1)/1 + ReLU, IN @mem(13107200-13135872B), OUT @mem(13135872B), WEIGHTS @mem( 4066112- 4181824B) (split1)
FPGA: Computing ........ done.
run time: 38ms
CPU: Offload CONV Layer f8/e3 : 8x8 x 112 > 256, CONV (3x3)/1p + ReLU, IN @mem(13107200-13135872B), OUT @mem(13136896B), WEIGHTS @mem( 4181824- 5215040B) (split2)
FPGA: Computing ........ done.
run time: 44ms
CPU: Offload CONV Layer f9/s1 : 8x8 x 512 > 112, CONV (1x1)/1 + ReLU, IN @mem(13135872-13266944B), OUT @mem(13266944B), WEIGHTS @mem( 5215040- 5444864B)
FPGA: Computing ........ done.
run time: 78ms
CPU: Offload CONV Layer f9/e1 : 8x8 x 112 > 368, CONV (1x1)/1 + ReLU, IN @mem(13266944-13295616B), OUT @mem(13295616B), WEIGHTS @mem( 5444864- 5611200B) (split1)
FPGA: Computing ........ done.
run time: 55ms
CPU: Offload CONV Layer f9/e3 : 8x8 x 112 > 368, CONV (3x3)/1p + ReLU, IN @mem(13266944-13295616B), OUT @mem(13297088B), WEIGHTS @mem( 5611200- 7096448B) (split2)
FPGA: Computing ........ done.
run time: 63ms
CPU: Offload CONV Layer c10/p1: 8x8 x 736 > 512, CONV (1x1)/1 , IN @mem(13295616-13484032B), OUT @mem(13484032B), WEIGHTS @mem( 7096448- 8605824B) (split1) GLOBAL POOL
FPGA: Computing ........ done.
run time: 501ms
CPU: Offload CONV Layer c10/p2: 8x8 x 736 > 512, CONV (1x1)/1 , IN @mem(13295616-13484032B), OUT @mem(13486080B), WEIGHTS @mem( 8605824-10115200B) (split2) GLOBAL POOL
FPGA: Computing ........ done.
run time: 499ms
CPU: Copy Results from FPGA DRAM (4096 Bytes)
Total run time: 3590ms
Result (top-5):
====================
88.38%: class 207 (output 18.94)
4.42%: class 852 (output 15.95)
4.25%: class 208 (output 15.91)
1.65%: class 219 (output 14.97)
0.20%: class 929 (output 12.85)
TestBench Result: SUCCESS
若参数输入FPGA则在分配完mem之后打出
XFPGA Driver: Initialize
test: could not open /dev/mem. need to be root: Permission denied
3._HLS_CODE
3.1 C simulation
这部分代码为用于进行HLS的c代码,其中fpga_top为top-level function,cpu_top为其test Bench。我们在运行时,前面正常输出,但是到了c10/p1在进行FPGA :computing的时候,会给出报错SIGSEGV,可能为内存相关的问题。
3.2 Synthesis
这部分排除了两个BUG,一个是在unittests.cpp程序中,这个程序用于测试相关的单元的功能。行中的显示没有seiosflags,我们发现这行作用并不大,直接删掉。
// unittests.cpp line 50
std::cout << std::setiosflags(std::ios::fixed) << std::setprecision(2)
<< "ERROR: " << acquired << " != " << expected << " in " << fn
<< " (" << file << ", line " << line << ")" << std::endl;
另一个BUG是在netconfig.cpp 中,给出报错没有fopen,printf等等的函数,我们直接在其加入 #include<stdio.h>。对于程序的其他问题我们需要在后续继续阅读与搞懂程序。
综合通过,然后export RTL
3.3 搭建系统与生成比特流
我们添加PS,定制PS 加入HP0,时钟周期设为200MHz,加入中断并连接,然后自动连接。