【caffe源码阅读】1.im2col

最新推荐文章于 2021-05-30 22:30:20 发布

dgh_dean

最新推荐文章于 2021-05-30 22:30:20 发布

阅读量1k

点赞数

分类专栏： caffe 文章标签： caffe源码

本文链接：https://blog.csdn.net/dgh_dean/article/details/77964496

版权

caffe 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

im2col.cpp

实习结束，终于有时间好好分析一个caffe源码了。

做个记录，免得以后要改caffe代码时忘了。

挑一些需要理解的算法型函数记录一下，也算监督自己了，希望能坚持下去。

关于im2col的算法引用自：这里

关于dilation卷积引用自：这里

将二维顺序排列的图像矩阵作变换，转换到一维连续空间中。主要目的是将输入矩阵与卷积核的卷积操作转换成矩阵乘法操作，方便使用openblas或者cublas等线性代数库进行卷积运算加速。

im2col的方法如下：

第一步是将im矩阵进行转换，转换后的矩阵中每一列是一个需要进行卷积操作的patch，即原图像中kernel size大小的一个patch。

然后是将卷积核展开成一维：

需要注意的是在caffe中是按行优先排列的，转换后的矩阵在内存中都是以连续内存空间的形式存储的，为线性代数库的优化提供保证。

上述例子实际在用基础线性代数库加速时就是一个1*27的矩阵与一个17*4的矩阵相乘。

除了以上im2col算法以外，caffe的im2col程序还包括了dilation convolution：

同样是3*3的卷积核，上图分别是dilation为0,2,4的不同感受野。

然后是caffe中的实现代码:

inline bool is_a_ge_zero_and_a_lt_b(int a, int b) {
	return static_cast<unsigned>(a) < static_cast<unsigned>(b);
}//0<a<b则返回true
template <typename Dtype>
void im2col_cpu(const Dtype* data_im, const int channels,
	const int height, const int width, const int kernel_h, const int kernel_w,
	const int pad_h, const int pad_w,
	const int stride_h, const int stride_w,
	const int dilation_h, const int dilation_w,
	Dtype* data_col) {

	const int output_h = (height + 2 * pad_h -
		(dilation_h * (kernel_h - 1) + 1)) / stride_h + 1;
	const int output_w = (width + 2 * pad_w -
		(dilation_w * (kernel_w - 1) + 1)) / stride_w + 1;
	const int channel_size = height * width;
	for (int channel = channels; channel--; data_im += channel_size) {
//最外层遍历原始im数据的三个通道，遍历完以后数据指针指向下一个通道的起始位置
		for (int kernel_row = 0; kernel_row < kernel_h; kernel_row++) {
			for (int kernel_col = 0; kernel_col < kernel_w; kernel_col++) {
//然后遍历原图像中一个通道里卷积核大小的一个patch
				int input_row = -pad_h + kernel_row * dilation_h;
//input_row用来确定padding和dilation产生的0的数量
				for (int output_rows = output_h; output_rows; output_rows--) {
//output_h*output_w等于im转col以后一个有多少列，这一层循环与下一层循环共同确定会产生多少//列数据。
					if (!is_a_ge_zero_and_a_lt_b(input_row, height)) {
						for (int output_cols = output_w; output_cols; output_cols--) {
							*(data_col++) = 0;//padding和dilation的位置补0
						}
					}
					else {
						int input_col = -pad_w + kernel_col * dilation_w;
						for (int output_col = output_w; output_col; output_col--) {
							if (is_a_ge_zero_and_a_lt_b(input_col, width)) {
								*(data_col++) = data_im[input_row * width + input_col];
//原图像值
							}
							else {
								*(data_col++) = 0; //padding和dilation的位置补0
							}
							input_col += stride_w;
						}
					}
					input_row += stride_h;
				}
			}
		}
	}
}

代码分如注释，分析代码这个代码时挺费时的，主要是理解有问题。一直以为输入矩阵data_im是连续通道存储的，因为io的实现时基于opencv的，在opencv里RGB三通道是连续存储的，所以看代码时一直觉得不对劲。

实际上输入数据指针中数据是使用分通道存储的，即一个通道的所有值连续存储，然后接着是下一个通道。这样一分析就清晰多了。