【深度学习】Python实现CNN操作（附代码）

只搬烫手的砖

已于 2023-12-04 13:39:54 修改

阅读量2w

点赞数 22

文章标签： python cnn 深度学习

于 2021-10-25 13:27:35 首次发布

本文链接：https://blog.csdn.net/qq_44747572/article/details/120949318

版权

本文探讨了深度学习中卷积神经网络（CNN）的动机，包括参数共享以减少网络规模，解决特征位置变化的问题。文章详细介绍了CNN的卷积(Conv)、最大池化(MaxPool)和Softmax层的工作原理，并提供了相关代码示例。最后，概述了CNN模型的训练过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

>参考： https://zhuanlan.zhihu.com/p/102119808 （可以直接看这个）

0 动机

通过普通的神经网络可以实现，但是现在图片越来越大，如果通过 NN 来实现，训练的参数太多。例如 224 x 224 x 3 = 150,528，隐藏层设置为 1024 就需要训练参数 150,528 x 1024 = 1.5 亿个，这还是第一层，因此会导致我们的网络很庞大。

另一个问题就是特征位置在不同的图片中会发生变化。例如小猫的脸在不同图片中可能位于左上角或者右下角，因此小猫的脸不会激活同一个神经元。

CNN相较于全连接能够实现参数的共享。当使用一个具有9个卷积核、大小为5*5、步长为1的滤波器对一个大小为224 x 224 x 3 的图片进行卷积时，其参数量大小：(5 x 5+1) x 9 x 3 = 702
注：不同通道之间的参数不共享。

1. Conv

Conv的原理示意图：

代码：

class Conv3x3:
    # 卷积层使用3*3的filter.
    def __init__(self, num_filters):
        self.num_filters = num_filters
        self.filters = np.random.randn(num_filters, 3, 3) / 9       # 除以9是为了减小初始值的方差
        
    def iterate_regions(self, image):
        h, w = image.shape
        
        for i in range(h - 2):                                   # (h-2)/(w-2)是滤波以单位为1的步长，所需要移动的步数
            for j in range(w - 2):
                im_region = image[i:(i + 3), j:(j + 3)]          # （i+3） 3*3的filter所移动的区域
                yield im_region, i, j
                
    def forward(self, input):
        # 28x28
        self.last_input = input
        
        h, w = input.shape
        output = np.zeros((h - 2, w - 2, self.num_filters))      # 创建一个（h-2）*（w-2）的零矩阵用于填充每次滤波后的值
        
        for im_region, i, j in self.iterate_regions(input):
            output[i, j] = np.sum(im_region * self.filters, axis=(1, 2))
            
        return output                                            # 4*4的矩阵经过3*3的filter后得到一个2*2的矩阵
    
    def backprop(self, d_L_d_out, learn_rate):
        # d_L_d_out: the loss gradient for this layer's outputs
        # learn_rate: a float
        d_L_d_filters = np.zeros(self.filters.shape)
        
        for im_region, i, j in self.iterate_regions(self.last_input):
            for f in range(self.num_filters):
                # d_L_d_filters[f]: 3x3 matrix
                # d_L_d_out[i, j, f]: num
                # im_region: 3x3 matrix in image
                d_L_d_filters[f] += d_L_d_out[i, j, f] * im_region
                
        # Update filters
        self.filters -= learn_rate * d_L_d_filters
        
        # We aren't returning anything here since we use Conv3x3 as
        # the first layer in our CNN. Otherwise, we'd need to return
        # the loss gradient for this layer's inputs, just like every
        # other layer in our CNN.
        return None

2. MaxPool

MaxPool的原理示意图：

代码：

class MaxPool2:
    # A Max Pooling layer using a pool size of 2.

    def iterate_regions(self, image):
        '''
        Generates non-overlapping 2x2 image regions to pool over.
        - image is a 2d numpy array
        '''
        # image: 3d matix of conv layer
        
        h, w, _ = image.shape
        new_h = h // 2
        new_w = w // 2

        for i in range(new_h):
            for j in range(new_w):
                im_region = image