MKL-DNN学习笔记 (四) Conv层的代码实现和分析

最新推荐文章于 2024-08-07 10:03:13 发布

sandmangu

最新推荐文章于 2024-08-07 10:03:13 发布

阅读量1.7k

点赞数 1

分类专栏：超分辨率重建 MKL-DNN 文章标签：超分辨率重建 MKL-DNN C++ Convolution

本文链接：https://blog.csdn.net/sandmangu/article/details/102495995

版权

超分辨率重建同时被 2 个专栏收录

13 篇文章 4 订阅

订阅专栏

MKL-DNN

9 篇文章 3 订阅

订阅专栏

接下来实现卷积层Conv的计算

为了简单起见，我用了一个N=1，C=2，H=W=5的输入。同时卷积层kernel大小KH = 3, KW = 3， IC=OC=2。

	const int N = 1, H = 5, W = 5, C = 2;
	const int IC = C, OC = IC, KH = 3, KW = 3;

这样可以用手工计算卷积的结果来比较mkldnn的输出，观察我们的代码是否运行正确。

创建输入数据的buffer, 权重和偏置数组weights和bias

	// Allocate a buffer for the image
	std::vector<float> image(image_size);
	std::vector<float> weights(weights_size);
	std::vector<float> bias(bias_size);

image数组按照它在内存里的偏移地址赋值；weights里OC=0时候全赋值为1，OC=1是全赋值为2； bias在OC=0时候赋值为0.01，OC=1是赋值为0.02 。这样根据Convolution Y=W*X+B的算法，我们只要观察最终输出数组的小数部分，就可以很容易的辨别出哪个数属于通道C=0，哪个数属于通道C=1

	// Initialize the image with some values
	for (int n = 0; n < N; ++n)
		for (int h = 0; h < H; ++h)
			for (int w = 0; w < W; ++w)
				for (int c = 0; c < C; ++c) {
					int off = offset(n, h, w, c); // Get the physical offset of a pixel
												  /*image[off] = -std::cos(off / 10.f);*/
												  /*image[off] = n * 1000 + h * 100 + w * 10 + c;*/
					image[off] = off;
					//std::cout << "off=" << off << " : " << image[off] << std::endl;
				}
	// [Create user's data]
	for (int n = 0; n < OC; ++n)
	{
		for (int c = 0; c < IC; ++c)
		{
			for (int h = 0; h < KH; ++h)
			{
				for (int w = 0; w < KW; ++w)
				{
					int off = offset_ws(n, c, h, w); // Get the physical offset of a pixel
													 //weights[off] = -std::cos(off / 10.f);
					if (n == 0)
					{
						weights[off] = 1;
					}
					else
					{
						weights[off] = 2;
					}
				}
			}
		}
	}
	for (int n = 0; n < OC; ++n)
	{
		bias[n] = 0.01+0.01*n;
	}

卷积层的代码实现和流程图

	memory::dims conv3_src_tz = { N, C, H, W };
	memory::dims conv3_weights_tz = { OC, IC, KH, KW };
	memory::dims conv3_bias_tz = { OC };
	memory::dims conv3_dst_tz = { N, OC, H, W };
	memory::dims conv3_strides = { 1, 1 };
	memory::dims conv3_padding = { 1, 1 };

	// [Init src_md]
	auto src3_md = memory::desc(
		conv3_src_tz, // logical dims, the order is defined by a primitive
		memory::data_type::f32,     // tensor's data type
		memory::format_tag::nhwc    // memory format, NHWC in this case
	);

	// [Init src_md]
	auto conv3_weights_md = memory::desc(
		conv3_weights_tz, memory::data_type::f32,
		memory::format_tag::oihw // 
	);
	auto conv3_bias_md = memory::desc({ conv3_bias_tz }, memory::data_type::f32, memory::format_tag::x);

	auto dst3_md = memory::desc(
		conv3_dst_tz, // logical dims, the order is defined by a primitive
		memory::data_type::f32,     // tensor's data type
		memory::format_tag::nhwc    // memory format, NHWC in this case
	);

	auto user_src_mem = memory(src3_md, cpu_engine, image.data());
	auto user_conv3_weights_mem = memory(conv3_weights_md, cpu_engine, weights.data());
	auto user_conv3_bias_mem = memory(conv3_bias_md, cpu_engine, bias.data());
	// For dst_mem the library allocates buffer
	auto user_conv3_dst_mem = memory(dst3_md, cpu_engine);  //for conv output

	auto conv3_d = convolution_forward::desc(prop_kind::forward_inference,
		algorithm::convolution_direct, src3_md, conv3_weights_md,
		conv3_bias_md,
		dst3_md, conv3_strides, conv3_padding,
		conv3_padding);
	auto conv3_pd = convolution_forward::primitive_desc(conv3_d, cpu_engine);

	// create convolution primitive and add it to net
	auto conv3 = convolution_forward(conv3_pd);

	conv3.execute(
		cpu_stream,
		{
			{ MKLDNN_ARG_SRC, user_src_mem },
			{ MKLDNN_ARG_WEIGHTS, user_conv3_weights_mem },
			{ MKLDNN_ARG_BIAS, user_conv3_bias_mem },
			{ MKLDNN_ARG_DST, user_conv3_dst_mem }
		}
	);


	// Wait the stream to complete the execution
	cpu_stream.wait();

最后观察一下数组的内容

1. 输入数组

    0     1 |    2     3 |    4     5 |    6     7 |    8     9 |
   10    11 |   12    13 |   14    15 |   16    17 |   18    19 |                                          
   20    21 |   22    23 |   24    25 |   26    27 |   28    29 |                                          
   30    31 |   32    33 |   34    35 |   36    37 |   38    39 |
   40    41 |   42    43 |   44    45 |   46    47 |   48    49 |

2.权重数组

    1     1     1 |
    1     1     1 |
    1     1     1 |

    1     1     1 |
    1     1     1 |
    1     1     1 |


    2     2     2 |
    2     2     2 |
    2     2     2 |

    2     2     2 |
    2     2     2 |                                                                                         
    2     2     2 |

3. 输出数组

52.01 104.02 |90.01 180.02 |114.01 228.02 |138.01 276.02 |100.01 200.02 |
138.01 276.02 |225.01 450.02 |261.01 522.02 |297.01 594.02 |210.01 420.02 |
258.01 516.02 |405.01 810.02 |441.01 882.02 |477.01 954.02 |330.01 660.02 |
378.01 756.02 |585.01 1170.02 |621.01 1242.02 |657.01 1314.02 |450.01 900.02 |
292.01 584.02 |450.01 900.02 |474.01 948.02 |498.01 996.02 |340.01 680.02 |

可以看到，我们的src_md,dst_md描述的内存是按照nhwc排列的，最终的内存里数据观察小数的排列规则看也是如此。再手工计算验证一下

src_mem的内容