卷积IPcore详细报告及进展

最新推荐文章于 2024-06-26 09:54:39 发布

祥瑞Coding

最新推荐文章于 2024-06-26 09:54:39 发布

阅读量1.6k

点赞数 1

分类专栏： FPGA 机器学习

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_36474809/article/details/85271940

版权

机器学习同时被 2 个专栏收录

133 篇文章 54 订阅

订阅专栏

61 篇文章 106 订阅

订阅专栏

目录

一、IPcore代码概览

1.3 时间与空间资源

1.3.1 空间资源

1.3.2 时间资源

二、IPcore正确性及验证

2.1 IPcore在MTCNN之中的调用

2.2 IPcore的testBench与c-simulation

2.3 synthesis与RTL输出

2.4 系统搭建与烧录

三、SDK端的调用

3.1 初始化IPcore

3.2 判断IPcore是否完成

3.3 运行情况

3.4 IPcore问题总结

四、与zynqNet的对比

4.1 一次卷积HLS的时钟周期

4.2 MACC操作的次数

一、IPcore代码概览

1.1 接口

//----------------convolution in FPGA-----------------------------------
int convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
int stride,
volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr){
#pragma HLS INTERFACE s_axilite port=inHight bundle=axilite
#pragma HLS INTERFACE s_axilite port=inWidth bundle=axilite
#pragma HLS INTERFACE s_axilite port=inChanNum bundle=axilite
#pragma HLS INTERFACE s_axilite port=outHight bundle=axilite
#pragma HLS INTERFACE s_axilite port=outWidth bundle=axilite
#pragma HLS INTERFACE s_axilite port=OutChanNum bundle=axilite
#pragma HLS INTERFACE s_axilite port=stride bundle=axilite
#pragma HLS INTERFACE s_axilite port=return bundle=axilite
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=weight_ptr offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=input_ptr offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=output_ptr offset=slave bundle=memorybus

接口描述：

软件端：通过函数接口传入参数，实现卷积运算。

硬件端：通过axi-lite协议将神经网络的参数通过ARM传入IPcore；

通过m-axi用IPcore从DRAM上获取相应的权重，特征以及写出卷积输出到DRAM上。

1.2 功能

用FPGA加速实现MTCNN之中的3*3卷积。

1.3 时间与空间资源

1.3.1 空间资源

下图为IPcore在7z035上占用的资源，在7z020上资源超出预期。

1.3.2 时间资源

因卷积为变长卷积，所以我们以Pnet的第一层输入为准来确定相应的时钟周期（Pnet的输入与Rnet和Onet不同，Pnet输入没有resize来缩减尺寸，因此卷积尺寸较大且较为耗时，Pnet第一层在单片机端卷积的模拟为数秒左右）。

卷积尺寸：输入640*480*3，输出尺寸640*480*10，权重尺寸 3*3*3*10

时钟周期：

总体的latency与II并不大，表明并行是有效果的。

二、IPcore正确性及验证

2.1 IPcore在MTCNN之中的调用

直接在mtcnn之中调用IPcore的c代码，每次3*3的卷积都对IPcore的c代码进行一次调用，例如：

    convolution_3x3(this->rgb->height,this->rgb->width,this->rgb->channel,
                   this->conv1_out->height,this->conv1_out->width,this->conv1_out->channel,
                   this->conv1_wb->stride,
                   this->conv1_wb->pdata,this->rgb->pdata,this->conv1_out->pdata);

经过调用之后的mtcnn能正确运行且产生正确的结果。

2.2 IPcore的testBench与c-simulation

编写testBench，将IPcore代码独立出来进行验证。且与卷积结果进行对比。无误，证明IPcore的代码可以独立出MTCNN的代码进行synthesis。

2.3 synthesis与RTL输出

IPcore的c代码输出及RTL输出正常无报错。

2.4 系统搭建与烧录

系统搭建validate未见异常，比特流生成正常，比特流通过SDK软件烧入FPGA正常。

三、SDK端的调用

3.1 初始化IPcore

通过HLS驱动来初始化IPcore及开始IPcore

    //set conv IPcore value
   XConvolution_3x3_Set_inHight(&XConvolution_3x3_Core,featureIn.height);
   XConvolution_3x3_Set_inWidth(&XConvolution_3x3_Core,featureIn.width);
   XConvolution_3x3_Set_inChanNum(&XConvolution_3x3_Core,featureIn.channel);
   XConvolution_3x3_Set_outHight(&XConvolution_3x3_Core,conv_PL_out.height);
   XConvolution_3x3_Set_outWidth(&XConvolution_3x3_Core,conv_PL_out.width);
   XConvolution_3x3_Set_OutChanNum(&XConvolution_3x3_Core,conv_PL_out.channel);
   XConvolution_3x3_Set_stride(&XConvolution_3x3_Core,weightIn.stride);
   XConvolution_3x3_Set_weight_ptr(&XConvolution_3x3_Core,(unsigned int)weightIn.pdata);
   XConvolution_3x3_Set_input_ptr(&XConvolution_3x3_Core,(unsigned int)featureIn.pdata);
   XConvolution_3x3_Set_output_ptr(&XConvolution_3x3_Core,(unsigned int)conv_PL_out.pdata);
   printf("Set conv parameters SUCCESS!\n");

   //conv in PL
   XConvolution_3x3_Start(&XConvolution_3x3_Core);
   printf("IPcore conv start SUCCESS!\n");

3.2 判断IPcore是否完成

    for(int cur_sleep_times=0;cur_sleep_times<20;cur_sleep_times++){
       if(!XConvolution_3x3_IsDone(&XConvolution_3x3_Core)){
           printf("IPcore Not Done! current sleep times is %d \n",cur_sleep_times);
       }
       else{
           printf("IP core Done SUCCESS!");
           break;
       }

usleep(1000*7);//7 mlili second
}

isDone变为1表示IPcore正常运行结束。

3.3 运行情况

小尺寸卷积正常运行

--------------program start-------------
init network parameters run time is 0.000474 mili second
Output variable init SUCCESS!
Write conv data to DRAM run time is 2.690571 mili second
Initialize XConvolution_3x3_Core IPcore SUCCESS!
---------print IP core value---------
IP core return is 0
IP core isDone is 0
IP core get inHight is 5
IP core get weight prt is 1000000
Set conv parameters SUCCESS!IPcore conv start SUCCESS!
---------print IP core value---------
IP core return is 0
IP core isDone is 1
IP core get inHight is 5
IP core get weight prt is 1000000
Strat again SUCCESS!
IP core Done SUCCESS!
Input_Pixels is 50 and hex memory size is 000000c8
weight_pixels is 36 and hex memory size is 00000090
Output_Pixels is 18 and hex memory size is 00000048
Input pointer value is 01000090
Weight pointer value is 01000000
Output PS pointer value is 010000d8
InputSize is    5, In_channels   is   2, Input_Pixels is 50
OutputSize is   3, Out_channels is   2, Output_Pixels is 18
Stride   is     1, weight_pixels is 36
------------Program End SUCCESS!-----------

卷积尺寸较大时导致单片机死机，例如下面这个尺寸：

InputSize is 23, In_channels is 32, Input_Pixels is 16928
OutputSize is 21, Out_channels is 64, Output_Pixels is 28224
Stride is 1, weight_pixels is 18432

Input_Pixels is 16928 and hex memory size is 00010880
weight_pixels is 18432 and hex memory size is 00012000
Output_Pixels is 28224 and hex memory size is 0001b900
Input pointer value is 01012000
Weight pointer value is 01000000
Output PS pointer value is 0102d900

3.4 IPcore问题总结

IPcore对DDR的指针无法与单片机共享，因为小尺寸卷积输出写入DDR的值单片机无法读出。

IPcore实现尺寸较大卷积会导致单片机死机。

四、与zynqNet的对比

4.1 一次卷积HLS的时钟周期

下面为zynqNet的时钟周期：

4.2 MACC操作的次数

全局变量的增加：在fpgaAcc.cpp之中，加入extern int，然后在pikaqiu中程序之外定义全局变量int。

MTCNN选用一张：85,176,568 为10^7数量级

另一张：43,543,288为 10^7数量级

zynqNe的MACC次数为: 152,731,648 ,为10^8数量级

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

祥瑞Coding 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。