卷积函数的FPGA实现(五)对IPcore进行HLS及bug查找

61 篇文章 102 订阅
20 篇文章 7 订阅

背景:我们编辑了3x3卷积的IPcore,并完成了预编译。程序通过调用3*3卷积的IPcore实现运行。并通过HLS预编译指令实现为硬件结构,现在我们需要对IPcore程序进行HLS。

目的:对卷积IPcore进行HLS

目录

一、testconvBench编写

1.1  linux下用cmake编译运行程序

1.2 隐患与BUG

1.3 testBench编写

  卷积尺寸

 卷积与结果对比

二、c-simulation

  出现bug更改流程

三、几个bug与解决

3.1 reg格式问题

3.2 关于DRAM接口的问题

3.3 DATAFLOW的错误

3.4 调试N_PE的问题

四、Bug位置查找

4.1 processInputChannel

 function instantiate

 WBRAM

  Loop 'L_CH_OUT'  in 'processAll_channelOut'

 OBRAM没有生成RTL端口

4.2 整个IPcore的HLS console


一、testconvBench编写

原程序需要调用OpenCV并且调用次数过多,无法当作HLS的testBench,我们需要编写简单的testBench,先确保IPcore无误且可用。

1.1  linux下用cmake编译运行程序

HLS_test文件夹,里面文件夹src放入相应程序。HLS_test文件夹创建CMakeList.txt文件

cmake_minimum_required(VERSION 2.8)
project(main)

set(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(QMAKE_CXXFLAGS "-std=c++11")

AUX_SOURCE_DIRECTORY(./src DIR_SRCS)
add_executable(test_convBench ${DIR_SRCS})

第一个表示cmake的最低版本,project表示编译的是主程序文件来生成可执行文件。set是编译器的类型。add_executable表示生成的可执行文件的名字和位置。

xxr@gpu-SYS-7048GR-TR:~/Desktop/xxr2/HLS_test$ cmake .
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/xxr/Desktop/xxr2/HLS_test
xxr@gpu-SYS-7048GR-TR:~/Desktop/xxr2/HLS_test$ make
Scanning dependencies of target test_convBench
[ 25%] Building CXX object CMakeFiles/test_convBench.dir/src/pBox.cpp.o
[ 50%] Building CXX object CMakeFiles/test_convBench.dir/src/fpgaAcc.cpp.o
[ 75%] Building CXX object CMakeFiles/test_convBench.dir/src/test_convBench.cpp.o
[100%] Linking CXX executable test_convBench
[100%] Built target test_convBench
xxr@gpu-SYS-7048GR-TR:~/Desktop/xxr2/HLS_test$ ./test_convBench
test SUCCESS!

1.2 隐患与BUG

void convolution_3x3(const Weight *weightIn, const pBox *pboxIn, pBox *outpBox)

我们先确定我们卷积的形式。我们将卷积形式定为下表格式,输入和权重随机生成。在PS端和IPcore端都进行3*3卷积验证输出是否正确。

InputSizeKernelSizeStridePaddingOutputSize

25*25*64

3*3*64

2

Valid(no padding)

12*12*64

两个相关的知识点:

  • outputSize=(InSize-Kernel)/Stride   +1;这是一个不需要pad的卷积。
  • 卷积中bias的数量,与输出通道的个数相等。(代表着所有的三维的卷积核相乘累加之后,加了一个相同的bias,即使在输出通道的不同位置,只要输出通道相同,就加相同的bias)

出现BUG原因:

  • stride在两种卷积之中出现问题(此bug需要用tensorFlow输出的权重调试)
  • OBRAM的尺寸不够
  • IBRAM的尺寸不够
  • WBRAM的尺寸不够

1.3 testBench编写

经过上面bug,我们需要确定,测试的网络结构不能过大免得OBRAM不够用的情况。

  卷积尺寸

经过运行与检验(卷积的Stride有误conv3*3与convlution对不上,stride=1时可以对上,stride为2的时候不能对上。流程较长,难以排查,需要等到后面输出参数再重新调试)暂时将stride设为1看结果。

InputSizeKernelSizeStridePaddingOutputSize

24*24*64

3*3*64

1

Valid(no padding)

22*22*64
//conv parameters
int inputSize=23; int inChannelNum=32;
int outputSize=21;  int OutChannelNum=64;
int kernelSize=3; int Stride=1;
int Input_Pixels=inputSize*inputSize*inChannelNum;
int Output_Pixels=outputSize*outputSize*OutChannelNum;
int weightkernel_Pixels=9*inChannelNum*OutChannelNum;

//conv variable
Weight weightIn;
pBox featureIn;
pBox conv_PL_out;
pBox conv_PS_out;

//initialize conv weight variable
weightIn.out_ChannelNum=OutChannelNum;
weightIn.in_ChannelNum=inChannelNum;
weightIn.kernelSize=kernelSize;
weightIn.stride=Stride;
weightIn.leftPad=0;
weightIn.rightPad=0;
weightIn.pdata=(float *)malloc(sizeof(float)*weightkernel_Pixels);
//weightIn.pbias=(float *)malloc(sizeof(float)*OutChannelNum);
for (int i=0;i<weightkernel_Pixels;i++){
	weightIn.pdata[i]=(rand()%100)/100.0;
	//weightIn.pdata[i]=(i%10)/10;
}

//initialize conv Input variable
featureIn.width=inputSize;
featureIn.height=inputSize;
featureIn.channel=inChannelNum;
featureIn.pdata=(float*)malloc(sizeof(float)*Input_Pixels);
for (int i=0;i<Input_Pixels;i++){
	featureIn.pdata[i]=(rand()%100)/100.0;
	//featureIn.pdata[i]=(i%10)/10;
}

//initialize conv Output variable
conv_PL_out.width=outputSize;
conv_PL_out.height=outputSize;
conv_PL_out.channel=OutChannelNum;
conv_PS_out.width=outputSize;
conv_PS_out.height=outputSize;
conv_PS_out.channel=OutChannelNum;	
conv_PS_out.pdata=(float*)malloc(sizeof(float)*Output_Pixels);
conv_PL_out.pdata=(float*)malloc(sizeof(float)*Output_Pixels);

根据卷积生成相应的网络尺寸。并且开辟相应的内存空间。将权重与卷积的值设为随机生成的值。

 卷积与结果对比

将IPcore的卷积,与实际的卷积进行对比,来判断相应的结果是否一致。

//conv in PS
convolution(&weightIn,&featureIn,&conv_PS_out);

//conv in PL
convolution_3x3(&weightIn,&featureIn,&conv_PL_out);

//compare in PS and PL
int error=0;
for(int i=0;i<Output_Pixels;i++){
	if(conv_PS_out.pdata[i]!=conv_PL_out.pdata[i]){
		printf("Convolution ERROR!\n");
		printf("i is %d, value in PS= %f, in PL is %f\n",i,conv_PS_out.pdata[i],conv_PL_out.pdata[i]);
		error=1;
	}
}
printf("Compare DONE SUCCESS!\n");
if(error==0)
	printf("PS and PL conv match SUCCESS!\n");
else
	printf("PS and PL conv match FAILURE!\n");

二、c-simulation

相关内容:FPGA实践教程(一)用HLS将c程序生成IPcore https://blog.csdn.net/weixin_36474809/article/details/80597166

服务器上HLS-test,将相应IPcore运行成功,即可直接放入HLS进行c-simulation。

INFO: [HLS 200-10] Setting target device to 'xc7z035ffg676-2'
INFO: [SIM 211-2] *************** CSIM start ***************
INFO: [SIM 211-4] CSIM will launch GCC as the compiler.
   Compiling ../../../../src/test_convBench.cpp in debug mode
   Compiling ../../../../src/fpgaAcc.cpp in debug mode
   Generating csim.exe
Test Start SUCCESS!
Variable init SUCCESS!
Conv in PS SUCCESS!
33convConv in PL SUCCESS!
Compare DONE SUCCESS!
PS and PL conv match SUCCESS!
INFO: [SIM 211-1] CSim done with 0 errors.

  出现bug更改流程

  • 修改后的文件拷入服务器FPGA-mtcnn之中,编译运行成功。表明程序可以与原程序嵌套运行成功
  • 这个文件拷入服务器HLS_test之中,编译运行成功。表明可以与testBench嵌套运行成功
  • 这个文件放入虚拟机src文件夹之中,更新HLS之中c-simulation得到相同结果即可synthesis

三、几个bug与解决

3.1 reg格式问题

3 errors generated.
ERROR: [HLS 200-70] Compilation errors found:
Pragma processor failed: In file included from src/fpgaAcc.cpp:1:
src/fpgaAcc.cpp:166:11: error: use of undeclared identifier 'reg'
 float px=reg(input_ptr[load_pixel_offset+in_channel_pixel_offset]);
          ^
src/fpgaAcc.cpp:173:15: error: use of undeclared identifier 'reg'
 float read = reg(weight_DRAM_ptr[weight_loc]);
              ^
src/fpgaAcc.cpp:208:12: error: use of undeclared identifier 'reg'
  float px=reg(ImageCache::get_IBRAM_Pixel(IBRAM_line_offset,pixel_col_to_load,
           ^
3 errors generated.
Failed checking during preprocessing.
    while executing
"source /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl"
    invoked from within
"hls::main /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 hls::main {*}$args"
    (procedure "hls_proc" line 5)
    invoked from within
"hls_proc $argv"

.h文件表示c文件,.hpp文件表示c++文件,可以将相应的.h文件改为.hpp文件。但是改了之后依然是此bug。

可能与reg的定义之前加了相应的#ifndef __SYNTHESIS__有关。不懂为什么zynqNet要加这个指令。我们将此指令删掉。此bug解决。

3.2 关于DRAM接口的问题

INFO: [XFORM 203-603] Inlining function 'MemoryController::writeBackOutputChannel' into 'convolution_3x3' (src/fpgaAcc.cpp:107).
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:34 ; elapsed = 00:00:26 . Memory (MB): peak = 361.926 ; gain = 13.668 ; free physical = 607 ; free virtual = 33200
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:336->src/fpgaAcc.cpp:327->src/fpgaAcc.cpp:84) automatically.
ERROR: [SYNCHK 200-11] src/fpgaAcc.cpp:259: Argument 'weightIn.pdata' of function 'convolution_3x3' (src/fpgaAcc.cpp:45) has an unsynthesizable type (possible cause(s): pointer to pointer or global pointer).
ERROR: [SYNCHK 200-61] src/fpgaAcc.cpp:174: unsupported memory access on variable 'weightIn.pdata' which is (or contains) an array with unknown size at compile time.
INFO: [SYNCHK 200-10] 2 error(s), 0 warning(s).
ERROR: [HLS 200-70] Synthesizability check failed.
command 'ap_source' returned error code
    while executing

ERROR: [SYNCHK 200-11] src/fpgaAcc.cpp:259: Argument 'weightIn.pdata' of function 'convolution_3x3' (src/fpgaAcc.cpp:45) has an unsynthesizable type (possible cause(s): pointer to pointer or global pointer).

weightIn.pdata这个包含着不能被HLS综合的类型,例如指针指向的指针,或者全局变量指针。

ERROR: [SYNCHK 200-61] src/fpgaAcc.cpp:174: unsupported memory access on variable 'weightIn.pdata' which is (or contains) an array with unknown size at compile time.

weightIn.pdata是一个(或者包含)不知大小的数组。

因此我们必须添加预编译指令对接口进行综合。

 MTCNN的FPGA实现(四)接口的HLS https://blog.csdn.net/weixin_36474809/article/details/84940846

3.3 DATAFLOW的错误

WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking:  A dataflow region cannot be instantiated from with a pipelined loop  (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.
Instruction does not dominate all uses!
  %tmp_60 = add i32 %WeightsCache_inChan_1, %tmp_59
  %memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_60), !dbg !1031
Broken module found, compilation aborted!
Stack dump:
0.	Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
1.	Running pass 'Module Verifier' on function '@convolution_3x3'
/mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 15937 Aborted                 (core dumped) "$RDI_PROG" "$@"
Finished C synthesis.

重要报错:

WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking:  A dataflow region cannot be instantiated from with a pipelined loop  (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.
Instruction does not dominate all uses!
  %tmp_60 = add i32 %WeightsCache_inChan_1, %tmp_59
  %memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_60), !dbg !1031
Broken module found, compilation aborted!
Stack dump:
0.    Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
1.    Running pass 'Module Verifier' on function '@convolution_3x3'
/mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 15937 Aborted                 (core dumped) "$RDI_PROG" "$@"

3.4 调试N_PE的问题

关于此,我们发现很有可能许多优化指令都没有添加成功。因为面板之中有一些这种报错,不知道是否添加成功。

 根据后面的console面板信息,我们发现可能是成功的,因为两点

  • zynqNet的IPcore在编译的过程之中也出现了上面的感叹号warning,但是成功输出RTL
  • 输出信息:INFO: [XFORM 203-101] Partitioning array 'OutputCache::OBRAM'  in dimension 1 with a cyclic factor 8.
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processAll_channelOut' to 'processAll_channelOu' (src/fpgaAcc.cpp:192:43)
INFO: [XFORM 203-811] Inferring bus burst read of variable length on port 'memorybus' (src/fpgaAcc.cpp:178:15).
WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
Instruction does not dominate all uses!
  %tmp_64 = add i32 %WeightsCache_inChan_1, %tmp_63
  %memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_64), !dbg !1031
Broken module found, compilation aborted!
Stack dump:
0.	Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
1.	Running pass 'Module Verifier' on function '@convolution_3x3'
/mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 35285 Aborted                 (core dumped) "$RDI_PROG" "$@"
Finished C synthesis.

在processAll_channelOut之中,是否展开系数N_PE这个优化指令能被运用上,我们直接将N_PE改为16,依然此报错。

将pipeline II=1删掉,依然同样报错。

四、Bug位置查找

嵌套IPcore过大,需要将其改小,单独的单元来进行测试。我们将processInputChannel设为top function,然后获得实验结果:

4.1 processInputChannel

Starting C synthesis ...
/mnt/workspace/Xilinx/Vivado/2017.4/bin/vivado_hls /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl
INFO: [HLS 200-10] Running '/mnt/workspace/Xilinx/Vivado/2017.4/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'osrc' on host 'osrc-virtual-machine' (Linux_x86_64 version 4.13.0-32-generic) on Wed Dec 12 10:37:09 CST 2018
INFO: [HLS 200-10] On os Ubuntu 16.04.3 LTS
INFO: [HLS 200-10] In directory '/home/osrc/Desktop/document/conv_Core/HLS_Conv'
INFO: [HLS 200-10] Opening project '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore'.
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.hpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.h' to the project
INFO: [HLS 200-10] Adding test bench file 'src/test_convBench.cpp' to the project
INFO: [HLS 200-10] Opening solution '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xc7z035ffg676-2'
INFO: [HLS 200-10] Analyzing design file 'src/pBox.cpp' ...
INFO: [HLS 200-10] Analyzing design file 'src/fpgaAcc.cpp' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:31 ; elapsed = 00:00:19 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 395 ; free virtual = 32671
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:33 ; elapsed = 00:00:21 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 393 ; free virtual = 32671
INFO: [HLS 200-10] Starting code transformations ...
INFO: [XFORM 203-603] Inlining function 'ImageCache::calcu_IBRAM_row_offset' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:209).
INFO: [XFORM 203-603] Inlining function 'ImageCache::get_IBRAM_Pixel' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:213).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::loadPixel_buffer' into 'ProcessingElement::processInputChannel' (src/fpgaAcc.cpp:230).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::get_9_weights_to_buffer' (src/fpgaAcc.cpp:307).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_9_weights_to_buffer' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:247).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::macc2d' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:249).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:384).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:252).
INFO: [XFORM 203-603] Inlining function 'OutputCache::getOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:382).
INFO: [XFORM 203-603] Inlining function 'OutputCache::accumulateChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:254).
WARNING: [XFORM 203-623] Cannot instantiate function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:225:1) for 'cur_ci' since none of the actual argument(s) of 'cur_ci' are constant or global.
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:35 ; elapsed = 00:00:23 . Memory (MB): peak = 361.910 ; gain = 13.648 ; free physical = 383 ; free virtual = 32663
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:35 ; elapsed = 00:00:23 . Memory (MB): peak = 361.910 ; gain = 13.648 ; free physical = 380 ; free virtual = 32661
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' for pipelining.
INFO: [XFORM 203-501] Unrolling loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' partially with a factor of 16.
INFO: [XFORM 203-501] Unrolling loop 'Loop-1.1' (src/fpgaAcc.cpp:308) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_multiply' (src/fpgaAcc.cpp:190) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_accumulate' (src/fpgaAcc.cpp:195) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-101] Partitioning array 'pixel_buffer'  in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'weights_local' (src/fpgaAcc.cpp:244) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM'  in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'multresult' (src/fpgaAcc.cpp:187) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'OutputCache::OBRAM'  in dimension 1 with a cyclic factor 8.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.0'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.1'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.2'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.3'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.4'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.5'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.6'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.7'  in dimension 2 completely.
WARNING: [XFORM 203-623] Cannot instantiate function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:225:1) for 'cur_ci' since none of the actual argument(s) of 'cur_ci' are constant or global.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:37 ; elapsed = 00:00:25 . Memory (MB): peak = 489.633 ; gain = 141.371 ; free physical = 353 ; free virtual = 32635
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processAll_channelOut' to 'processAll_channelOu' (src/fpgaAcc.cpp:241:43)
WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
INFO: [HLS 200-111] Finished Architecture Synthesis Time (s): cpu = 00:00:38 ; elapsed = 00:00:27 . Memory (MB): peak = 489.633 ; gain = 141.371 ; free physical = 349 ; free virtual = 32632
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'ProcessingElement::processInputChannel' ...
WARNING: [SYN 201-103] Top function name 'ProcessingElement::processInputChannel' is not a legal RTL name.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-103] Top function name 'ProcessingElement::processInputChannel' is not a legal RTL name and is changed to 'ProcessingElement_processInputChannel'; this may result in automatic C/RTL co-simulation failure.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'processAll_channelOu'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'L_CH_OUT'.
WARNING: [SCHED 204-69] Unable to schedule 'store' operation (src/fpgaAcc.cpp:395->src/fpgaAcc.cpp:384->src/fpgaAcc.cpp:254) of variable 'new_ch', src/fpgaAcc.cpp:383->src/fpgaAcc.cpp:254 on array 'OBRAM_0' due to limited memory ports. Please consider using a memory core with more ports or partitioning the array 'OBRAM_0'.
INFO: [SCHED 204-61] Pipelining result : Target II = 1, Final II = 2, Depth = 10.
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111]  Elapsed time: 27.98 seconds; current allocated memory: 89.244 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111]  Elapsed time: 0.59 seconds; current allocated memory: 90.872 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'ProcessingElement_processInputChannel'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111]  Elapsed time: 0.4 seconds; current allocated memory: 90.982 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111]  Elapsed time: 0.08 seconds; current allocated memory: 91.045 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'processAll_channelOu'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_0' to 'processAll_channebkb' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_1' to 'processAll_channecud' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_2' to 'processAll_channedEe' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_3' to 'processAll_channeeOg' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_4' to 'processAll_channefYi' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_5' to 'processAll_channeg8j' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_6' to 'processAll_channehbi' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'processAll_channelOu_OBRAM_7' to 'processAll_channeibs' due to the length limit 20
INFO: [SYN 201-210] Renamed object name 'ProcessingElement_processInputChannel_fadd_32ns_32ns_32_4_full_dsp_1' to 'ProcessingElementjbC' due to the length limit 20
INFO: [RTGEN 206-100] Generating core module 'ProcessingElementjbC': 8 instance(s).
INFO: [RTGEN 206-100] Finished creating RTL model for 'processAll_channelOu'.
INFO: [HLS 200-111]  Elapsed time: 0.92 seconds; current allocated memory: 93.369 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'ProcessingElement_processInputChannel'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'ProcessingElement_processInputChannel/cur_row_times_stride' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'ProcessingElement_processInputChannel/cur_col_times_stride' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'ProcessingElement_processInputChannel/cur_ci' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on port 'ProcessingElement_processInputChannel/out_channelNum' to 'ap_none'.
INFO: [RTGEN 206-500] Setting interface mode on function 'ProcessingElement_processInputChannel' to 'ap_ctrl_hs'.
WARNING: [RTGEN 206-101] Global array 'OBRAM_0' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_1' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_2' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_3' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_4' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_5' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_6' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_7' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Port 'ProcessingElement_processInputChannel/cur_row_times_stride' has no fanin or fanout and is left dangling.
               Please use C simulation to confirm this function argument can be read from or written to.
WARNING: [RTGEN 206-101] Port 'ProcessingElement_processInputChannel/cur_col_times_stride' has no fanin or fanout and is left dangling.
               Please use C simulation to confirm this function argument can be read from or written to.
INFO: [RTGEN 206-100] Finished creating RTL model for 'ProcessingElement_processInputChannel'.
INFO: [HLS 200-111]  Elapsed time: 1.04 seconds; current allocated memory: 97.566 MB.
INFO: [RTMG 210-278] Implementing memory 'processAll_channebkb_ram (RAM_T2P_BRAM)' using block RAMs with power-on initialization.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:00:42 ; elapsed = 00:00:32 . Memory (MB): peak = 489.633 ; gain = 141.371 ; free physical = 320 ; free virtual = 32614
INFO: [SYSC 207-301] Generating SystemC RTL for ProcessingElement_processInputChannel.
INFO: [VHDL 208-304] Generating VHDL RTL for ProcessingElement_processInputChannel.
INFO: [VLOG 209-307] Generating Verilog RTL for ProcessingElement_processInputChannel.
INFO: [HLS 200-112] Total elapsed time: 32.18 seconds; peak allocated memory: 97.566 MB.
Finished C synthesis.

其中需要注意的问题:

 function instantiate

WARNING: [XFORM 203-623] Cannot instantiate function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:225:1) for 'cur_ci' since none of the actual argument(s) of 'cur_ci' are constant or global.

此报错出现了两次,但是zynqNet在只HLS inputchannel函数时候也会出此报错。

在整个IPcore实现时没有出现此报错。

 WBRAM

WBRAM的报错相同。

WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.
WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.

包括在进行数组分开的时候,WBRAM已经与zynqNet展现出不同,下标少了。

INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.7'  in dimension 2 completely.

zynqNet的INFO为:

INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.15.2'  in dimension 2 completely.

但是后面出现了同样的报错:可能是进行分开之后WBRAM的名字发生了改变。

WARNING: [SYN 201-303] Cannot apply memory assignment of 'RAM_S2P_BRAM' (src/fpgaAcc.cpp:305->src/fpgaAcc.cpp:247): 'WBRAM_0_0' does not exist or is optimized away.

  Loop 'L_CH_OUT'  in 'processAll_channelOut'

MTCNN会比zynqNet多了一个警告:

WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.

 OBRAM没有生成RTL端口

可能因为OBRAM分成的与尺寸不匹配,最终OBRAM比zynqNet的console多了:

WARNING: [RTGEN 206-101] Global array 'OBRAM_0' will not be exposed as RTL port.
WARNING: [RTGEN 206-101] Global array 'OBRAM_1' will not be exposed as RTL port. 。。。

4.2 整个IPcore的HLS console

Starting C synthesis ...
/mnt/workspace/Xilinx/Vivado/2017.4/bin/vivado_hls /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl
INFO: [HLS 200-10] Running '/mnt/workspace/Xilinx/Vivado/2017.4/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'osrc' on host 'osrc-virtual-machine' (Linux_x86_64 version 4.13.0-32-generic) on Tue Dec 11 18:46:57 CST 2018
INFO: [HLS 200-10] On os Ubuntu 16.04.3 LTS
INFO: [HLS 200-10] In directory '/home/osrc/Desktop/document/conv_Core/HLS_Conv'
INFO: [HLS 200-10] Opening project '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore'.
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.hpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.h' to the project
INFO: [HLS 200-10] Adding test bench file 'src/test_convBench.cpp' to the project
INFO: [HLS 200-10] Opening solution '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xc7z035ffg676-2'
INFO: [HLS 200-10] Analyzing design file 'src/pBox.cpp' ...
INFO: [HLS 200-10] Analyzing design file 'src/fpgaAcc.cpp' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:31 ; elapsed = 00:00:20 . Memory (MB): peak = 361.641 ; gain = 13.375 ; free physical = 320 ; free virtual = 32652
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:33 ; elapsed = 00:00:21 . Memory (MB): peak = 361.641 ; gain = 13.375 ; free physical = 318 ; free virtual = 32651
INFO: [HLS 200-10] Starting code transformations ...
INFO: [XFORM 203-603] Inlining function 'MemoryController::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:77).
INFO: [XFORM 203-603] Inlining function 'ImageCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:78).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:79).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::get_9_weights_to_buffer' (src/fpgaAcc.cpp:307).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:284).
INFO: [XFORM 203-603] Inlining function 'MemoryController::load_weight_2_reg' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:291).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::load_WBRAM_from_DRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:83).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:94).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:87).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:85).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadOffset' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:330).
INFO: [XFORM 203-603] Inlining function 'MemoryController::loadInputChannelPixel' into 'ImageCache::loadPixelDRAM_2_IBRAM' (src/fpgaAcc.cpp:339).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadPixelDRAM_2_IBRAM' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:331).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:95).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:88).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:86).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelOutOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:99).
INFO: [XFORM 203-603] Inlining function 'ImageCache::calcu_IBRAM_row_offset' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:209).
INFO: [XFORM 203-603] Inlining function 'ImageCache::get_IBRAM_Pixel' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:213).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::loadPixel_buffer' into 'ProcessingElement::processInputChannel' (src/fpgaAcc.cpp:230).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_9_weights_to_buffer' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:247).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::macc2d' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:249).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:384).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:252).
INFO: [XFORM 203-603] Inlining function 'OutputCache::getOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:382).
INFO: [XFORM 203-603] Inlining function 'OutputCache::accumulateChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:254).
INFO: [XFORM 203-603] Inlining function 'MemoryController::writeBackOutputChannel' into 'convolution_3x3' (src/fpgaAcc.cpp:109).
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:34 ; elapsed = 00:00:23 . Memory (MB): peak = 361.930 ; gain = 13.664 ; free physical = 307 ; free virtual = 32642
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:35 ; elapsed = 00:00:23 . Memory (MB): peak = 361.930 ; gain = 13.664 ; free physical = 303 ; free virtual = 32639
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' for pipelining.
INFO: [XFORM 203-501] Unrolling loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' partially with a factor of 8.
INFO: [XFORM 203-501] Unrolling loop 'Loop-1.1' (src/fpgaAcc.cpp:308) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_multiply' (src/fpgaAcc.cpp:190) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_accumulate' (src/fpgaAcc.cpp:195) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-101] Partitioning array 'pixel_buffer' (src/fpgaAcc.cpp:228) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'weights_local' (src/fpgaAcc.cpp:244) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM'  in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'multresult' (src/fpgaAcc.cpp:187) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'OutputCache::OBRAM'  in dimension 1 with a cyclic factor 8.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.0'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.1'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.2'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.3'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.4'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.5'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.6'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.7'  in dimension 2 completely.
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
INFO: [XFORM 203-622] Instantiating function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:221) to 'ProcessingElement::processInputChannel.0' at call site (src/fpgaAcc.cpp:103) by setting 'cur_ci' to 'cur_channel_in'.
INFO: [XFORM 203-721] Changing loop 'Loop_load_pixel_2_PE_row_loop_proc' (src/fpgaAcc.cpp:207) to a process function for dataflow in function 'ProcessingElement::processInputChannel.0'.
INFO: [XFORM 203-712] Applying dataflow to function 'ProcessingElement::processInputChannel.0' (src/fpgaAcc.cpp:224:1), detected/extracted 2 process function(s): 
	 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5'
	 'ProcessingElement::processAll_channelOut'.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:37 ; elapsed = 00:00:26 . Memory (MB): peak = 489.637 ; gain = 141.371 ; free physical = 275 ; free virtual = 32614
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1.1' (src/fpgaAcc.cpp:283:18) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1' (src/fpgaAcc.cpp:280:18) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4.1' (src/fpgaAcc.cpp:93:3) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'row_loop' (src/fpgaAcc.cpp:91:85) in function 'convolution_3x3' : 


more than one sub loop.
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5' to 'processInputChannel.' (src/fpgaAcc.cpp:207:3)
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0' to 'processInputChannel..1' (src/fpgaAcc.cpp:226:1)
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processAll_channelOut' to 'processAll_channelOu' (src/fpgaAcc.cpp:192:43)
INFO: [XFORM 203-811] Inferring bus burst read of variable length on port 'memorybus' (src/fpgaAcc.cpp:178:15).
WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking:  A dataflow region cannot be instantiated from with a pipelined loop  (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.
Instruction does not dominate all uses!
  %tmp_60 = add i32 %WeightsCache_inChan_1, %tmp_59
  %memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_60), !dbg !1031
Broken module found, compilation aborted!
Stack dump:
0.	Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
1.	Running pass 'Module Verifier' on function '@convolution_3x3'
/mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 15937 Aborted                 (core dumped) "$RDI_PROG" "$@"
Finished C synthesis.

重要的错误来自两点:

unknow bound和dataflow不能实现。

WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking:  A dataflow region cannot be instantiated from with a pipelined loop  (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.

后续需要对这些BUG进行调试。初步判断BUG为processAll_channelOut之中的for循环展开的问题与OBRAM的分开有差别的问题。

  • 5
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

祥瑞Coding

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值