2022.10.16 第四次周报

最新推荐文章于 2024-08-14 11:19:01 发布

杨幂臭脚丫子

最新推荐文章于 2024-08-14 11:19:01 发布

阅读量406

点赞数

文章标签：深度学习计算机视觉 cnn

本文链接：https://blog.csdn.net/weixin_57523712/article/details/127177918

版权

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

摘要
一、cnn是什么？
二、正向传播
学习来源：

摘要

本周学习了cnn卷积神经网络，但是存在一个问题，只搞懂了正向传播，反向传播的公式推导没有完全领会。本周学习进度较慢，没有阅读文献，下周改进。

一、cnn是什么？

1.卷积层

CNN的全称是Convolutional Neural Network，是一种前馈神经网络。由一个或多个卷积层、池化层以及顶部的全连接层组成。其核心是卷积层中的卷积核，卷积核也称过滤器，如平滑过滤器：通过卷积核把图形的局部特征提取出来可以处理一个像素点到周围像素点的关系，还有垂直边界过滤器和水平边界过滤器，分辨是处理垂直和水平两个角度的像素关系。

过滤器
简略而言，卷积神经网络就是让权重在不同位置共享的神经网络。
假设卷积核的权重如下：
在这里插入图片描述
偏移量为b，那么在卷积核提取的神经元的第一个基本特征就是：

还有一个概念是步长，就是向前移动多少个单元格。如下图：最开始是0，1，4，5，下一次提取但局部特征是1，2，5，6，最后到a，b，e，f。
在44的矩阵中，提取局部特征之后会明显小一圈，变成33的矩阵，所以又引入了另一个概念，零填充。
局部特征提取
如在3x3滤波器卷积4x4输入图像以生成2x2输出图像中，通常，我们希望输出图像与输入图像的大小相同。为此，我们在图像周围添加零，以便在更多位置覆盖过滤器。3x3过滤器需要1像素填充：
在这里插入图片描述
这只是一个卷积核，但我们需要多个卷积核，从多个角度提取多种特征。也就是说在初始层种，输进去一个二维的图片，输出的将是一个三维的长方形，在原二维的基础上，多加了一维卷积核的层数。

一般有以下定义：
在这里插入图片描述

2.池化层

图像中的相邻像素往往具有相似的值，因此卷积层通常也会为输出中的相邻象素产生相似的值。因此，卷积层输出中包含的许多信息都是冗余的。
所以在卷积后还会有一个pooling的操作，它们所做的只是通过将值集中在输入中来减少输入的大小。池通常通过一个简单的操作完成，如max、min或average。以下是池大小为2的Max Pooling层的示例：
在这里插入图片描述
这里步长为2，也就是池化矩阵的尺寸。所以把卷积层输出26x26x8矩阵，当作了池化层的输入，转换为13x13x8输出：

3.全连接层

当抓取到足以用来识别图片的特征后，接下来的就是如何进行分类。全连接层（也叫前馈层）就可以用来将最后的输出映射到线性可分的空间。通常卷积网络的最后会将末端得到的长方体平摊(flatten)成一个长长的向量，并送入全连接层配合输出层进行分类。
下面展示使用一个具有10个节点的softmax层，每个节点代表一个数字，作为CNN中的最后一层。层中的每个节点都将连接到每个输入。应用softmax变换后，由概率最高的节点表示的数字将成为CNN的输出！
在这里插入图片描述
Softmax函数，是逻辑函数的一种推广。它能将一个含任意实数的K维向量z“压缩”到另一个K维实向量σ(z)中，使得每一个元素的范围都在(0,1)之间，并且所有元素的和为1。该函数多用于多分类问题中。

Softmax介绍
当然Softmax的引入是为了求出最后的准确程度，所以这只是一个过度，最后还需要引入交叉熵损失Loss这个概念。softmax真正做的是帮助我们量化我们对预测的确信程度，这在训练和评估我们的CNN时很有用。更具体地说，使用softmax可以让我们使用交叉熵损失，它考虑了我们对每个预测的把握程度。Loss： L=−ln(pc)
如：pc=1,L=−ln(1)=0；pc =0.8,L=−ln(0.8)=0.223。
这里的L表示损失程度，所以越小越好，当pc等于1的时候，表示完全准确，所以损失程度为0，当只有80%的时候，损失程度就是0.223了。

二、正向传播

1.卷积层实现：

# 卷积层 
%matplotlib inline
class Conv3x3:
  # A Convolution layer using 3x3 filters.

  def __init__(self, num_filters):                # num_filters：过滤器的数量
    self.num_filters = num_filters

    # filters is a 3d array with dimensions (num_filters, 3, 3)
    # We divide by 9 to reduce the variance of our initial values
    self.filters = np.random.randn(num_filters, 3, 3) / 9                 # 初始化随机过滤器三维数组filters
    # print (self.filters)

  def iterate_regions(self, image):
    '''
    Generates all possible 3x3 image regions using valid padding.
    - image is a 2d numpy array
    '''
    h, w = image.shape

    for i in range(h - 2):
      for j in range(w - 2):
        im_region = image[i:(i + 3), j:(j + 3)]             # im_region：一个包含相关图像区域的3x3阵列
        yield im_region, i, j                               #  yield 的用法：https://www.icode9.com/content-1-1386849.html

  def forward(self, input):
    '''
    Performs a forward pass of the conv layer using the given input.
    Returns a 3d numpy array with dimensions (h, w, num_filters).
    - input is a 2d numpy array
    '''
    h, w = input.shape
    output = np.zeros((h - 2, w - 2, self.num_filters))

    for im_region, i, j in self.iterate_regions(input):
      output[i, j] = np.sum(im_region * self.filters, axis=(1, 2))     
                        #axis=(1, 2)：(0,1,2)代表三维数组的高度x宽度x长度，长度为第三维度指num_filters过滤器的数量，因为我们只希望在后面两个维度（宽度x长度）
                        #axis是一个整型的元组类型，则在多个轴上执行求和，而不是在单个轴上执行求和了。
                        # np.sum()，产生一个长度为1d的数组，其中每个元素包含相应过滤器的卷积结果。
                        # output[i,j] 表示输出中像素的卷积结果（i，j）
    return output
  
''' 测试'''
import numpy as np
from keras.datasets import mnist
    
(train_images, train_labels),(test_images,test_labels)=mnist.load_data()

conv = Conv3x3(8)
output =conv.forward(train_images[0])
print(output.shape)

把图片标准化，测试卷积的结果的格式，(26, 26, 8)为格式输出。

2.池化层实现：

# 池化层
class MaxPool2:
  # A Max Pooling layer using a pool size of 2.

  def iterate_regions(self, image):
    '''
    Generates non-overlapping 2x2 image regions to pool over.
    - image is a 2d numpy array
    '''
    h, w, _ = image.shape                            # 将原来的图像大小缩小一半
    new_h = h // 2
    new_w = w // 2

    for i in range(new_h):
      for j in range(new_w):
        im_region = image[(i * 2):(i * 2 + 2), (j * 2):(j * 2 + 2)]        #2*2的过滤器，为了不重复，每次移动两步。
        yield im_region, i, j

  def forward(self, input):
    '''
    Performs a forward pass of the maxpool layer using the given input.
    Returns a 3d numpy array with dimensions (h / 2, w / 2, num_filters).
    - input is a 3d numpy array with dimensions (h, w, num_filters)
    '''
    h, w, num_filters = input.shape
    output = np.zeros((h // 2, w // 2, num_filters))            #生产 num_filters 个新的矩阵

    for im_region, i, j in self.iterate_regions(input):
      output[i, j] = np.amax(im_region, axis=(0, 1))               # 挑2*2矩阵中最大的值MAX

    return output


import numpy as np
from keras.datasets import mnist
    
(train_images, train_labels),(test_images,test_labels)=mnist.load_data()

conv = Conv3x3(8)
pool =MaxPool2()
output =conv.forward(train_images[0])
output =pool.forward(output)
print(output.shape)

把图片用Max函数取最大的特征值，并测试池化后的结果的格式，(13, 13, 8)为格式输出。

3.全连接层实现：

#全连接层Softmax
class Softmax:
  # A standard fully-connected layer with softmax activation.

  def __init__(self, input_len, nodes):
    # We divide by input_len to reduce the variance of our initial values           
    self.weights = np.random.randn(input_len, nodes) / input_len                     #input_len表示长*宽*高的个数
    self.biases = np.zeros(nodes)

  def forward(self, input):
    '''
    Performs a forward pass of the softmax layer using the given input.
    Returns a 1d numpy array containing the respective probability values.
    - input can be any array with any dimensions.
    '''
    self.last_input_shape = input.shape

    input = input.flatten()                                                        #flatten()，把维度是a*b*c的矩阵，换成维度只有一维的，有a*b*c个元素的矩阵

    self.last_input = input

    input_len, nodes = self.weights.shape

    totals = np.dot(input, self.weights) + self.biases
    self.last_totals = totals

    totals = np.dot(input, self.weights) + self.biases                              # softmax函数 求概率
    exp = np.exp(totals)
    return exp / np.sum(exp, axis=0)
 

(train_images, train_labels),(test_images,test_labels)=mnist.load_data()
conv = Conv3x3(8)

train_images = train_images[:1000]
train_labels = train_labels[:1000]

conv = Conv3x3(8)                  # 28x28x1 -> 26x26x8
pool = MaxPool2()                  # 26x26x8 -> 13x13x8
softmax = Softmax(13 * 13 * 8, 10) # 13x13x8 -> 10

最后用损失函数对函数进行评测：

def forward(image, label):
  '''
  Completes a forward pass of the CNN and calculates the accuracy and
  cross-entropy loss.
  - image is a 2d numpy array
  - label is a digit
  '''
  # We transform the image from [0, 255] to [-0.5, 0.5] to make it easier
  # to work with. This is standard practice.
  out = conv.forward((image / 255) - 0.5)
  out = pool.forward(out)
  out = softmax.forward(out)

  # Calculate cross-entropy loss and accuracy. np.log() is the natural log.
  loss = -np.log(out[label])
  acc = 1 if np.argmax(out) == label else 0

  return out, loss, acc



print('MNIST CNN initialized!')

# Train!
loss = 0
num_correct = 0
for i, (im, label) in enumerate(zip(train_images, train_labels)):
  if i % 100 == 99:
    print(
      '[Step %d] Past 100 steps: Average Loss %.3f | Accuracy: %d%%' %
      (i + 1, loss / 100, num_correct)
    )
    loss = 0
    num_correct = 0