通过python实现卷积神经网络_Python 徒手实现卷积神经网络 CNN

最新推荐文章于 2024-05-08 16:26:37 发布

weixin_39520595

最新推荐文章于 2024-05-08 16:26:37 发布

阅读量988

点赞数 1

文章标签：通过python实现卷积神经网络

本文链接：https://blog.csdn.net/weixin_39520595/article/details/111424231

版权

本文详细介绍了如何使用Python从零开始实现卷积神经网络（CNN），包括卷基层、池化层、softmax层的实现，并通过MNIST数据集进行实践。文章深入浅出地解释了卷积、padding、池化等概念，以及反向传播过程。通过简单的CNN模型，实现了约78%的测试准确率。

摘要由CSDN通过智能技术生成

1. 动机(Motivation)

通过普通的神经网络可以实现，但是现在图片越来越大，如果通过 NN 来实现，训练的参数太多。例如 224 x 224 x 3 = 150,528，隐藏层设置为 1024 就需要训练参数 150,528 x 1024 = 1.5 亿个，这还是第一层，因此会导致我们的网络很庞大。

另一个问题就是特征位置在不同的图片中会发生变化。例如小猫的脸在不同图片中可能位于左上角或者右下角，因此小猫的脸不会激活同一个神经元。

2. 数据集(Dataset)

我们使用手写数字数据集 MNIST 。

每个数据集都以一个 28x28 像素的数字。

普通的神经网络也可以处理这个数据集，因为图片较小，另外数字都集中在中间位置，但是现实世界中的图片分类问题可就没有这么简单了，这里只是抛砖引玉哈。

3. 卷积(Convolutions)

CNN 相较于 NN 来说主要是增加了基于 convolution 的卷积层。卷基层包含一组 filter，每一个 filter 都是一个 2 维的矩阵。以下为 3x3 filter：

我们可以通过输入的图片和上面的 filter 来做卷积运算，然后输出一个新的图片。包含以下步骤：将 filter 叠加在图片的顶部，一般是左上角

然后执行对应元素的相乘

将相乘的结果进行求和，得到输出图片的目标像素值

重复以上操作在所有位置上

执行效果如下所示：

3.1 有用吗？

通过卷积可以提取图片中的特定线条，垂直线条或者水平线条，以下为 vertical Sobel filter and horizontal Sobel filter 的结果：

卷积可以帮助我们查找一些图片特征(例如边缘)。

3.2 Padding(填充)

可以通过在周围补 0 实现输出前后图像大小一致，如下所示：

这叫做 "same padding"，不过一般不用 padding，叫做 "valid" padding。

3.3 卷基层

CNN 包含卷基层，卷基层通过一组 filter 将输入的图片转为输出的图片。卷基层的主要参数是 filter 的个数。

对于 MNIST CNN，我使用一个含有 8 个 filter 的卷基层，意味着它将 28x28 的输入图片转为 26x26x8 的输出集：

卷基层的 8 个 filter 分别产生 26x26 的输出，只有 3 x 3 (filter size) x 8 (nb_filters) = 72 权重值。

3.4 卷积层代码实现

简单起见，我们使用 3x3 的filter，首先实现一个卷基层的类：

import numpy as np

class Conv3x3:

# A Convolution layer using 3x3 filters.

def __init__(self, num_filters):

self.num_filters = num_filters

# filters is a 3d array with dimensions (num_filters, 3, 3)

# We divide by 9 to reduce the variance of our initial values

self.filters = np.random.randn(num_filters, 3, 3) / 9

Conv3x3 类只需要一个参数：filter 个数。通过 NumPy 的 randn() 方法实现。之所以在初始化的时候除以 9 是因为对于初始化的值不能太大也不能太小，参考：Xavier Initialization。

接下来，具体实现卷基层：

class Conv3x3:

# ...

def iterate_regions(self, image):

'''Generates all possible 3x3 image regions using valid padding.- image is a 2d numpy array'''

h, w = image.shape

for i in range(h - 2):

for j in range(w - 2):

im_region = image[i:(i + 3), j:(j + 3)]

yield im_region, i, j

# 将 im_region, i, j 以 tuple 形式存储到迭代器中

# 以便后面遍历使用

def forward(self, input):

'''Performs a forward pass of the conv layer using the given input.Returns a 3d numpy array with dimensions (h, w, num_filters).- input is a 2d numpy array'''

# input 为 image，即输入数据

# output 为输出框架，默认都为 0，都为 1 也可以，反正后面会覆盖

# input: 28x28

# output: 26x26x8

h, w = input.shape

output = np.zeros((h - 2, w - 2, self.num_filters))

for im_region, i, j in self.iterate_regions(input):

# 卷积运算，点乘再相加，ouput[i, j] 为向量，8 层

output[i, j] = np.sum(im_region * self.filters, axis=(1, 2))

# 最后将输出数据返回，便于下一层的输入使用

return output

4. 池化(Pooling)

图片的相邻像素具有相似的值，因此卷基层中很多信息是冗余的。通过池化来减少这个影响，包含 max, min or average，下图为基于 2x2 的 Max Pooling：

与卷积计算类似，只是这个更容易，只是计算最大值并赋值。池化层将会把 26x26x8 的输入转为 13x13x8 的输出：

4.1 池化层代码实现

import numpy as np

class MaxPool2:

# A Max Pooling layer using a pool size of 2.

def iterate_regions(self, image):

'''

Generates non-overlapping 2x2 image regions to pool over.

- image is a 2d numpy array

'''

# image: 26x26x8

h, w, _ = image.shape

new_h = h // 2

new_w = w // 2

for i in range(new_h):

for j in range(new_w):

im_region = image[(i * 2):(i * 2 + 2), (j * 2):(j * 2 + 2)]

yield im_region, i, j

def forward(self, input):

'''

Performs a forward pass of the maxpool layer using the given input.

Returns a 3d numpy array with dimensions (h / 2, w / 2, num_filters).

- input is a 3d numpy array with dimensions (h, w, num_filters)

'''

# input: 卷基层的输出，池化层的输入

h, w, num_filters = input.shape

output = np.zeros((h // 2, w // 2, num_filters))

for im_region, i, j in self.iterate_regions(input):

output[i, j] = np.amax(im_region, axis=(0, 1))

return output

5. Softmax

为了完成我们的 CNN，我们需要进行具体的预测。通过 softmax 来实现，将一组数字转换为一组概率，总和为 1。参考：Softmax function。

5.1 用法

我们将要使用一个含有 10 个节点(分别代表相应数字)的 softmax 层，作为我们 CNN 的最后一层。最后一层为一个全连接层，只是激活函数为 softmax。经过 softmax 的变换，数字就是具有最高概率的节点。

softmax 为 13x13x8 转换为一列节点后与 10 个节点组成一个全连接，然后 softmax 为激活函数。

5.2 交叉熵损失函数(Cross-Entropy Loss)

交叉熵用来计算概率间的距离，具体公式可参考：笔记 | 什么是Cross Entropy。

其中：为真实概率

为预测概率

为预测结果与真实结果的差距

在我们的具体问题中，对于真实概率，只有分类正确数字对应的概率为 1，其他均为 0，因此交叉熵损失函数可以写成如下形式：

其中，

是正确分类(本例中为正确的数字)，

是

类的预测概率。

的值越小越好。

5.3 Softmax 层代码实现

import numpy as np

class Softmax:

# A standard fully-connected layer with softmax activation.

def __init__(self, input_len, nodes):

# We divide by input_len to reduce the variance of our initial values

# input_len: 输入层的节点个数，池化层输出拉平之后的

# nodes: 输出层的节点个数，本例中为 10

# 构建权重矩阵，初始化随机数，不能太大

self.weights = np.random.randn(input_len, nodes) / input_len

self.biases = np.zeros(nodes)

def forward(self, input):

'''

Performs a f

最低0.47元/天解锁文章

weixin_39520595

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
通过python实现卷积神经网络_Python 徒手实现卷积神经网络 CNN

1. 动机(Motivation)通过普通的神经网络可以实现，但是现在图片越来越大，如果通过 NN 来实现，训练的参数太多。例如 224 x 224 x 3 = 150,528，隐藏层设置为 1024 就需要训练参数 150,528 x 1024 = 1.5 亿个，这还是第一层，因此会导致我们的网络很庞大。另一个问题就是特征位置在不同的图片中会发生变化。例如小猫的脸在不同图片中可能位于左上角...
复制链接

扫一扫