深度学习入门之第七章卷积神经网络

最新推荐文章于 2023-02-07 17:01:06 发布

YYLin-AI

最新推荐文章于 2023-02-07 17:01:06 发布

阅读量856

点赞数 1

分类专栏：深度学习入门基于python的理论与实现文章标签：深度学习

版权归世界上所有无产阶级所有

本文链接：https://blog.csdn.net/qq_41776781/article/details/87907725

版权

深度学习入门基于python的理论与实现专栏收录该内容

10 篇文章 1 订阅

订阅专栏

前言：现在深度学习是现在深度学习中一直在用的一项技术，本章主要将卷积神经网络的三个部分。

第一：（CNN）卷积神经网络的基本运算（原理与源码实现）

第二：（CNN）中池化层的原理与源码实现

第三：（CNN）卷积操作的可视化

# （CNN）卷积神经网络的基本运算 OK 这是从百度上找到的一张CNN的运算过程git, 就拿这张图进行分析吧

CNN的运算是由三个部分组成的，第一部分原始数据对应的左图，第二部分滤波器也可以称之为卷积核中间的那个图。第三部对应的是图像的右边代表原始数据和卷积核运算的结果

我相信看了上面的git图像，CNN的计算过程一目了然。我在简单的介绍一下。

如果使用VALLD的话，输入数据就没有外圈的padding 0 如果使用SAME的话，就有可能出现外圈的那层0

第一步：你把卷积核放到对应的输入数据最左上角。

第二步：输入数据和卷积核对应的位置进行相乘然后相加 c11 =(a11 * b11) +(a12 * b12)+(a13 * b13)+......+(a33 * b33)

第三步：根据步长( stride )、输入数据的（high, width） + 扩展pad的维度计算输出矩阵（high, width）

$out_h = (H + 2*pad - filter_h)/stride + 1$

$out_w = (W + 2*pad - filter_w)//stride + 1$

第四步：移动卷积核，重复第三步一直到卷积核碰到输入数据最下角

ok 我们分析一下tensorflow中源码怎么实现以上的卷积操作的。

import numpy as np
import tensorflow as tf


# im2col 的功能是做图像的卷积操作
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):

    # 首先按照第一步 获取输入数据的形状  分别对应的是  [Batch-size,  Channel,  High,  Width]     
    N, C, H, W = input_data.shape
    
    # 第二步计算输出卷积核的高度和宽度
    out_h = (H + 2*pad - filter_h)//stride + 1
    out_w = (W + 2*pad - filter_w)//stride + 1

    img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))

    for y in range(filter_h):
        y_max = y + stride*out_h
        for x in range(filter_w):
            x_max = x + stride*out_w
            # y:y_max:stride是指从   Y 到 Y_MAX  每次移动stride
            col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]

    # col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N , out_h , out_w, -1)
    return col


# 定义输入的数据的形状 [batch_size,  channel,  high,  width]
x = np.ones((10, 3, 5, 5))

# 定义卷积核的大小和步长  pad设置为0
result = im2col(x, 1, 1, stride=1, pad=0)

print("the result of conv2d difined by myself is:", result.shape)



# tensorflow 定义数据数据， 定义卷积核  定义步长
# tf.nn.conv2d(input, filter, strides, padding)
input = tf.Variable(tf.random_normal([10,5,5,3]))
filter = tf.Variable(tf.random_normal([1,1,3,3]))

result_tf = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
print("the result of conv2d difined by tensorflow is:",result_tf)

result_tf = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')

tensorflow中的conv2d函数总共有四个参数，

# 第一个参数：表示输入数据其形状是 [batch, in_height, in_width, in_channels]
# 第二个参数：表示卷积核大小其形状是[filter_height, filter_width, in_channels, out_channels] 注意一下，第一个参数和第二个参数in_channels必须是一样的而且out_channels表示输出的频道数
# 第三个参数：strides[] 是指在这几个维度上的步长对应上面讲的第三步
# 第四个参数：padding扩展方式为SAME输入输出一样， VAILD时，不进行扩充

# 实验结果两者输出的形状一直，验证成功

# my conv2d 和 tensorflow 中定义的conv2d的对比
the result of conv2d difined by myself is: (10, 5, 5, 3)
the result of conv2d difined by tensorflow is: Tensor("Conv2D:0", shape=(10, 5, 5, 3), dtype=float32)

第二：（CNN）中池化层的原理与源码实现注意书上的源码是有问题 pading方式上有问题

步骤：第一将数据展开，然后求各行最大值，最后输出数据。

# 输入数据的形状（high, width）是（5，5）

#过滤器的形状（high, width）是（2，2）步长是 2 扩展方式选择不扩展所以成功

所以数据的输出（2，2）

import numpy as np
import  tensorflow as tf
def im2col(input_data, filter_h, filter_w, stride=1, pad=0):

    N, C, H, W = input_data.shape
    out_h = (H + 2*pad - filter_h)//stride + 1
    out_w = (W + 2*pad - filter_w)//stride + 1

    img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
    col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))

    for y in range(filter_h):
        y_max = y + stride*out_h
        for x in range(filter_w):
            x_max = x + stride*out_w
            col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]

    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
    print("**************",col.shape)
    return col

class Pooling:
    def __init__(self, pool_h, pool_w, stride=1, pad=0):
        self.pool_h = pool_h
        self.pool_w = pool_w
        self.stride = stride
        self.pad = pad

        self.x = None
        self.arg_max = None

    def forward(self, x):
        N, C, H, W = x.shape
        out_h = int(1 + (H - self.pool_h) / self.stride)
        out_w = int(1 + (W - self.pool_w) / self.stride)

        col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
        col = col.reshape(-1, self.pool_h * self.pool_w)

        arg_max = np.argmax(col, axis=1)
        out = np.max(col, axis=1)
        out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)

        self.x = x
        self.arg_max = arg_max

        return out

x = np.ones((10, 3, 5, 5))
result = Pooling(2,2,2)
# print("resultresultresultresultresultresult",result)

my_pooling_result = result.forward(x)
print("*************pooling_result**************", my_pooling_result.shape)


x = np.ones((10, 5, 5, 3))
tf_pooling_result = tf.nn.max_pool(x, [1,2,2,1], [1,2,2,1],padding = 'VALID')
print("*************tf_pooling_result**************", tf_pooling_result.shape)

实验结果：形状之所以不同，在于通道被放在那个位置

'''
************** (40, 12)
*************pooling_result************** (10, 3, 2, 2)
*************tf_pooling_result************** (10, 2, 2, 3)
'''

YYLin-AI

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
深度学习入门之第七章卷积神经网络

前言：现在深度学习是现在深度学习中一直在用的一项技术，本章主要将卷积神经网络的三个部分。第一：（CNN）卷积神经网络的基本运算（原理与源码实现）第二：（CNN）中池化层的原理与源码实现第三：（CNN）卷积操作的可视化 # （CNN）卷积神经网络的基本运算 OK 这是从百度上找到的一张CNN的运算过程git, 就拿这张图进行分析吧CNN的运算是由三个部分组成的，第一部分...
复制链接

扫一扫