手撕代码，不用深度学习框架自己写卷积层

最新推荐文章于 2024-06-06 17:31:44 发布

一颗磐石

最新推荐文章于 2024-06-06 17:31:44 发布

阅读量1.1k

点赞数

分类专栏：神经网络文章标签：手写卷积 python 深度学习卷积神经网络卷积层

本文链接：https://blog.csdn.net/just_do_myself/article/details/118391172

版权

神经网络专栏收录该内容

18 篇文章 12 订阅

订阅专栏

随着深度学习技术的不断更新，应用越来越广泛。为了方便开发，各大公司都开源了自己深度学习框架，比如Google的Tensorflow，Facebook的Pytorch，百度的PaddlePaddle飞桨，里边各种函数接口API我们调用到手软，非常方便，而且上手也很快。但是在面试的时候，面试官为了考察应聘者对深度学习原理的理解程度，有时会让手撕源码，今天我们就来练习一下。

在本文中，我们先简单介绍一下卷积层和池化层，然后开始写代码。

卷积层介绍

在卷积层中，我们会设置一个或多个固定 (或不固定) 大小(3 * 3, 5 * 5, 7 * 7…)和形状的算子，即卷积核。每个卷积核分别在图像或feature map上滑动，进行点乘相加运算，也就是进行局部空间信息的融合交互感知。每个卷积核滑动遍历整张图像或feature map得到一张新的feature map (也就是一个通道)，即有多少个卷积核，便可以得到多少通道。卷积核的设计就是模仿了人的眼睛在感受事物时的状态，我们观察事物的时候便是先获取局部细节信息，然后结合得到全局信息。看图~，一个卷积核的工作原理。
在这里插入图片描述
在CNN中，卷积运算有诸多参数，比如 input, out_channels, kernel_size, strides, padding等。

input: 输入图像/feature map
out_channels: 输出图像/feature map的通道数
kernel_size: 卷积核的大小
strides: 卷积核每次滑动的步长
padding: 扩充边缘像素的方式

代码

import numpy as np
import math
import cv2

import matplotlib.pyplot as plt

class Conv2D(object):
    def __init__(self, shape, output_channels, kernel_size=3, stride=1, method='VALID'):
        self.input_shape = shape
        self.output_channels = output_channels
        self.input_channels = shape[-1]
        self.batch_size = shape[0]
        self.stride = stride
        self.ksize = kernel_size
        self.method = method
        weights_scale = math.sqrt(kernel_size * kernel_size * self.input_channels / 2)
        # 卷积核以及偏置参数初始化，标准正态分布
        self.weights = np.random.standard_normal((kernel_size, kernel_size, self.input_channels, self.output_channels)) // weights_scale
        self.bias = np.random.standard_normal(self.output_channels) // weights_scale
        # 设置卷积后图像的大小，如果选择“VALID”，输出图像会根据卷积核以及步长改变；如果选择“SAME”，输出图像尺寸不变
        if method == 'VALID':
            self.eta = np.zeros((shape[0], (shape[1] - kernel_size) // self.stride + 1, (shape[2] - kernel_size) // self.stride + 1,self.output_channels))
        if method == 'SAME':
            self.eta = np.zeros((shape[0], shape[1]//self.stride, shape[2]//self.stride, self.output_channels))
        # 初始化权重和偏置的梯度
        self.w_gradient = np.zeros(self.weights.shape)
        self.b_gradient = np.zeros(self.bias.shape)
        self.output_shape = self.eta.shape
	# 前向计算方法
    def forward(self,x):
    	# 首先对卷积核进行reshape, 之后直接进行矩阵运算，提高计算效率, [in_channels, kernel_size, kernel_size, out_channels] ---> [in_channels * kernel_size * kernel_size, out_channels]
        col_weights = self.weights.reshape([-1, self.output_channels])
        # 如果保持输出feature map的shape保持不变，那么对边缘进行zero填充
        if self.method == 'SAME':
            x = np.pad(x, ((0, 0), (self.ksize // 2, self.ksize // 2), (self.ksize // 2, self.ksize // 2), (0, 0)), 'constant', constant_values=0)
        self.col_image = []
        conv_out = np.zeros(self.eta.shape)
        # 对batch里每个数据进行单独循环处理
        for i in range(self.batch_size):
        	# 取batch中第i个数据进行维度扩展
            img_i = x[i][np.newaxis, ...]
            # 对该数据进行矩阵化，方便进行向量化运算, [1, height, width, channels]
            self.col_image_i = self.im2col(img_i, self.ksize, self.stride)
            # 使用矩阵点乘得到卷积后的结果
            conv_out[i] = np.reshape(np.dot(self.col_image_i, col_weights) + self.bias, self.eta[0].shape)
        return conv_out

    # 将图像取与卷积核大小相同的patch，patch的大小为k_size*k_size*3，将patch reshape一行为(k_size*k_size*3,1)，若有col个patch，则整个图像转换为[col, k_size*k_size*3]
    def im2col(self, image, k_size, stride):
        image_col = []
        for i in range(0, image.shape[1] - k_size+1, stride):
            for j in range(0, image.shape[2]-k_size+1, stride):
                col = image[:, i:i+k_size, j:j+k_size, :].reshape([-1]) # 一个patchreshape成一个向量
                image_col.append(col)
        image_col = np.array(image_col)
        return image_col
        

if __name__ == '__main__':
    image = cv2.imread(r'C:\Users\11468\Desktop\sea_wind.jpg')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    # 添加batch维度
    image = image[np.newaxis, ...]
    print("input:", image.shape)
    # 输出通道设为6, 卷积核尺寸3*3, 步长为1, 输出图像尺寸变化
    conv2d = Conv2D(image.shape, 6, 3, 1, 'VALID')
    conv_out = conv2d.forward(image)
    print("output:", conv_out.shape)
	# 卷积结果可视化
    fig = plt.figure()
    for i in range(6):
        ax = plt.subplot(2, 3, i + 1)
        plt.imshow(conv_out[0][:, :, i])
        plt.axis('off')
        plt.title('The {}-th Channel of Feature Map'.format(i + 1), fontsize=8)
    plt.show()