深度理解卷积--使用numpy实现卷积

卷积

在深度学习里CNN卷积神经网络是最常见的概念,可以算AI届的hello world了。https://www.jianshu.com/p/fc9175065d87这个文章中用动图很好的解释了什么叫做卷积。
其实很早的图像处理里,使用一个滤波器扫一遍图像就类似现在深度学习里卷积的计算过程,只是AI中核是需要通过学习得到的。
本文就不从理论上详细介绍卷积了,程序员就要有程序员的亚子,所以我直接上代码介绍怎么用numpy实现卷积

numpy实现卷积

基础定义

以CV中对图像卷积为例,图像卷积一般都是
输入:四维数组[B,H,W,C_in]
卷积核:四维数组[C_in,K,K,C_out]
输出:四维数组[B,H2,W2,C_out]

B—batchsize输入对图片张数
H,W—输入图片对高和宽
C_in—输入图片对通道数,比如RGB图像就是三通道,C_in=3
K—卷积核对宽/高,通常宽=高
C_out—有多少个卷积核
H2,W2—输出特征图对高核宽

为什么H和H2不一致,是因为需要根据padding等情况而定。
在卷积时还有stride等概念,本文设置stride=1,因为理解了本文代码后,其他情况完全可以方便实现。

单个核卷积单通道

首先我们从低维入手,图片张数为1,单个卷积核,单个通道输入:
输入—[H,W]
卷积核—[K,K]
输出—[H2,W2]
也就是下图过程:
在这里插入图片描述
上图我们发现输入核输出不一致,是因为它使用VALID模式的padding,如果我们希望输入和输出一致,就需要使用SAME模式,如下图
在这里插入图片描述
padding模式对输出大小做下简单介绍:
如果是VALID模式,我们输出会变小,输出大小为
( H − K + 1 , W − K + 1 ) (H-K+1,W-K+1) HK+1WK+1
如果是SMAE模式,输出和输入需要一样大小,所以需要Padding值,通常为0,假设数组H=W=N,padding的维度为p,即上下左右都进行添加p
( N + 2 p − K + 1 , N + 2 p − K + 1 ) = ( N , N ) (N+2p-K+1,N+2p-K+1)=(N,N) (N+2pK+1,N+2pK+1)=(N,N)
所以
p = ( k − 1 ) / 2 p=(k-1)/2 p=(k1)/2

好了~直接上代码,过程有详细注释:

def numpy_conv(inputs,filter,_result,padding="VALID"):
    H, W = inputs.shape
    filter_size = filter.shape[0]
    # default np.floor
    filter_center = int(filter_size / 2.0)
    filter_center_ceil = int(np.ceil(filter_size / 2.0))

    #这里先定义一个和输入一样的大空间,但是周围一圈后面会截掉
    result = np.zeros((_result.shape))
    #更新下新输入,SAME模式下,会改变HW
    H, W = inputs.shape
    #print("new size",H,W)
    #卷积核通过输入的每块区域,stride=1,注意输出坐标起始位置
    for r in range(0, H - filter_size + 1):
        for c in range(0, W - filter_size + 1):
            # 池化大小的输入区域
            cur_input = inputs[r:r + filter_size,
                        c:c + filter_size]
            #和核进行乘法计算
            cur_output = cur_input * filter
            #再把所有值求和
            conv_sum = np.sum(cur_output)
            #当前点输出值
            result[r, c] = conv_sum
    return result

现在我们完成了最简单的单核卷积。下面我们进行扩展,用多个核卷积多个通道维度的数据。

多卷积核多通道

和以上逻辑一致,只是多了一个维度而已,
输入—[C_in,H,W]
卷积核—[C_out,K,K]
输出—[C_out,C_in,H2,W2]
下图过程是单个核卷积多通道的过程:
在这里插入图片描述
在这里插入图片描述
多个核就是上面过程执行多遍,然后把结果累加。了解理论后就知道,就是多两个for循环的事:
(1)一个循环是输入通道数对循环:把卷积核在每个通道数据上卷积,然后结果累加
(2)一个循环是核个数对循环:每个卷积核执行步骤(1),然后把结果累加
是不是很简单~
上代码:

def _conv(inputs, filter,strides=[1,1], padding="SAME"):
    C_in, H, W = inputs.shape
    filter_size = filter.shape[2]
    # C_out指核对个数,也是最后结果对通道个数
    C_out = filter.shape[0]
    # 同样我们任务核对宽高相等
    '''
        最终输出大小
        SAME:
        h_out = h_in/stride 
        w_out = w/in/stride

        VALID:
        h_out = ceil((h_in-fliter_h+1)/stride)
        w_out = ceil((w_in-fliter_w+1)/stride)

        需要PAD
        VALID:不需要PAD
        SAME:
        pad_h = (h_out -1 ) * stride + fliter_h - H
        pad_h_top = pad_h/2
        pad_h_bottom = pad_h - pad_h_top
        pad_w_left = (w_out -1 ) * stride + fliter_w - W
        pad_w_right = w_out - pad_w_left
        '''
    if padding == "VALID":
        result = np.zeros(
            [C_out, int(np.ceil(H - filter_size + 1) / strides[0]), int(np.ceil(W - filter_size + 1) / strides[1])],
            np.float32)
    else:
        result = np.zeros([C_out, int(H / strides[0]), int(W / strides[1])], np.float32)
        C, H_new, W_new = inputs.shape
        pad_h = (H_new - 1) * strides[0] + filter_size - H
        pad_top = int(pad_h / 2)
        pad_down = pad_h - pad_top

        pad_w = (W_new - 1) * strides[1] + filter_size - W
        pad_left = int(pad_w / 2)
        pad_right = pad_w - pad_left
        inputs = np.pad(inputs, ((0, 0), (pad_top, pad_down), (pad_left, pad_right)), 'constant',
                        constant_values=(0, 0))
    # 核个数对循环
    for channel_out in range(C_out):
        # 输入通道数对循环
        for channel_in in range(C_in):
            # 当前通道对数据
            channel_data = inputs[channel_in]
            # 采用上面对逻辑,单核单通道卷积,然后累计
            result[channel_out, :, :] += numpy_conv(channel_data, filter[channel_out][channel_in], result[0],padding)

    # print(result)
    return result

如果有多张图片呢?每个图片进行以上结果,然后cancat就好了~不就是再加一个循环的事么~~~~
现在!应该!!!!对卷积!!!完全懂了!!!!吧!!!!!

我们上完整代码

# -*- coding: utf-8 -*-
import numpy as np

def numpy_conv(inputs,filter,_result,padding="VALID"):
    H, W = inputs.shape
    filter_size = filter.shape[0]
    # default np.floor
    filter_center = int(filter_size / 2.0)
    filter_center_ceil = int(np.ceil(filter_size / 2.0))

    #这里先定义一个和输入一样的大空间,但是周围一圈后面会截掉
    result = np.zeros((_result.shape))
    #更新下新输入,SAME模式下,会改变HW
    H, W = inputs.shape
    #print("new size",H,W)
    #卷积核通过输入的每块区域,stride=1,注意输出坐标起始位置
    for r in range(0, H - filter_size + 1):
        for c in range(0, W - filter_size + 1):
            # 池化大小的输入区域
            cur_input = inputs[r:r + filter_size,
                        c:c + filter_size]
            #和核进行乘法计算
            cur_output = cur_input * filter
            #再把所有值求和
            conv_sum = np.sum(cur_output)
            #当前点输出值
            result[r, c] = conv_sum
    return result



def _conv(inputs, filter,strides=[1,1], padding="SAME"):
    C_in, H, W = inputs.shape
    filter_size = filter.shape[2]
    # C_out指核对个数,也是最后结果对通道个数
    C_out = filter.shape[0]
    # 同样我们任务核对宽高相等
    if padding == "VALID":
        result = np.zeros(
            [C_out, int(np.ceil(H - filter_size + 1) / strides[0]), int(np.ceil(W - filter_size + 1) / strides[1])],
            np.float32)
    else:
        result = np.zeros([C_out, int(H / strides[0]), int(W / strides[1])], np.float32)
        C, H_new, W_new = inputs.shape
        pad_h = (H_new - 1) * strides[0] + filter_size - H
        pad_top = int(pad_h / 2)
        pad_down = pad_h - pad_top

        pad_w = (W_new - 1) * strides[1] + filter_size - W
        pad_left = int(pad_w / 2)
        pad_right = pad_w - pad_left
        inputs = np.pad(inputs, ((0, 0), (pad_top, pad_down), (pad_left, pad_right)), 'constant',
                        constant_values=(0, 0))
    # 核个数对循环
    for channel_out in range(C_out):
        # 输入通道数对循环
        for channel_in in range(C_in):
            # 当前通道对数据
            channel_data = inputs[channel_in]
            # 采用上面对逻辑,单核单通道卷积,然后累计
            result[channel_out, :, :] += numpy_conv(channel_data, filter[channel_out][channel_in], result[0],padding)

    # print(result)
    return result

if __name__ == '__main__':
    #输入[C_in,H,W]
    inputs = np.zeros([3,9,9])
    for i in range(3):
        for j in range(9):
            for z in range(9):
                inputs[i][j][z] = i+j+z

    print("input:\n",inputs,"\n")

    #卷积核[C_out,C_in,K,K]
    filter = np.zeros([2, 3, 3, 3])
    for i in range(2):
        for j in range(3):
            for x in range(3):
                for y in range(3):
                    filter[i][j][x][y] = i + j + x + y


    print("filter\n",filter,"\n")

    final_result = _conv(inputs, filter, strides=[1,1],padding="SAME")

    print("result\n",final_result,"\n")

我怎么知道我结果是对还是错的呢~下次我们用tensorflow的API来进行验证!

  • 24
    点赞
  • 107
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
手写数字识别使用卷积神经网络纯numpy实现需要经过以下步骤: 1.准备数据集:使用MNIST数据集,其中包含60,000个训练图像和10,000个测试图像。 2.数据预处理:将图像数据进行标准化处理,将标签数据进行one-hot编码。 3.构建卷积神经网络模型:使用numpy实现卷积层、池化层、全连接层和softmax层。 4.训练模型:使用训练集对模型进行训练。 5.测试模型:使用测试集对模型进行测试,并计算分类准确率。 下面是一个具体的代码实现: ```python import numpy as np import mnist #pip install python-mnist # 数据预处理 def normalize(x): x = x / 255. return x def one_hot(y): return np.eye(10)[y] # 加载数据集 train_images = mnist.train_images() train_labels = mnist.train_labels() test_images = mnist.test_images() test_labels = mnist.test_labels() # 数据预处理 train_images = normalize(train_images) test_images = normalize(test_images) train_labels = one_hot(train_labels) test_labels = one_hot(test_labels) # 构建卷积神经网络模型 class ConvolutionalNeuralNetwork: def __init__(self): self.conv1_filters = 8 self.conv1_filter_size = 3 self.conv2_filters = 16 self.conv2_filter_size = 3 self.fc1_units = 128 self.output_units = 10 self.conv1_weights = np.random.randn(self.conv1_filters, self.conv1_filter_size, self.conv1_filter_size) self.conv1_bias = np.zeros((self.conv1_filters,)) self.conv2_weights = np.random.randn(self.conv2_filters, self.conv2_filter_size, self.conv2_filter_size) self.conv2_bias = np.zeros((self.conv2_filters,)) self.fc1_weights = np.random.randn(7*7*self.conv2_filters, self.fc1_units) self.fc1_bias = np.zeros((self.fc1_units,)) self.output_weights = np.random.randn(self.fc1_units, self.output_units) self.output_bias = np.zeros((self.output_units,)) def conv2d(self, x, weight, bias): filter_size = weight.shape[1] output_size = x.shape[0] - filter_size + 1 output = np.zeros((output_size, output_size)) for i in range(output_size): for j in range(output_size): output[i, j] = np.sum(x[i:i+filter_size, j:j+filter_size] * weight) + bias return output def max_pool2d(self, x, size): output_size = x.shape[0] // size output = np.zeros((output_size, output_size)) for i in range(output_size): for j in range(output_size): output[i, j] = np.max(x[i*size:i*size+size, j*size:j*size+size]) return output def relu(self, x): return np.maximum(x, 0) def softmax(self, x): exp_x = np.exp(x) return exp_x / np.sum(exp_x) def forward(self, x): x = x.reshape((28, 28)) x = self.conv2d(x, self.conv1_weights, self.conv1_bias) x = self.relu(x) x = self.max_pool2d(x, 2) x = self.conv2d(x, self.conv2_weights, self.conv2_bias) x = self.relu(x) x = self.max_pool2d(x, 2) x = x.reshape((-1,)) x = np.dot(x, self.fc1_weights) + self.fc1_bias x = self.relu(x) x = np.dot(x, self.output_weights) + self.output_bias x = self.softmax(x) return x def train(self, x, y, learning_rate): # 前向传播 x = x.reshape((28, 28)) conv1_output = self.conv2d(x, self.conv1_weights, self.conv1_bias) conv1_output_relu = self.relu(conv1_output) max_pool1_output = self.max_pool2d(conv1_output_relu, 2) conv2_output = self.conv2d(max_pool1_output, self.conv2_weights, self.conv2_bias) conv2_output_relu = self.relu(conv2_output) max_pool2_output = self.max_pool2d(conv2_output_relu, 2) fc1_input = max_pool2_output.reshape((-1,)) fc1_output = np.dot(fc1_input, self.fc1_weights) + self.fc1_bias fc1_output_relu = self.relu(fc1_output) output_input = np.dot(fc1_output_relu, self.output_weights) + self.output_bias output_output = self.softmax(output_input) # 反向传播 output_error = output_output - y output_delta = output_error fc1_error = np.dot(output_delta, self.output_weights.T) fc1_delta = fc1_error * (fc1_output_relu > 0) fc1_weights_grad = np.outer(fc1_input, fc1_delta) fc1_bias_grad = fc1_delta conv2_error = fc1_delta.reshape((7, 7, self.conv2_filters)) conv2_delta = conv2_error * (conv2_output_relu > 0) conv2_weights_grad = np.zeros_like(self.conv2_weights) for i in range(self.conv2_filters): conv2_weights_grad[i] = np.sum(max_pool1_output[:, :, i:i+1] * conv2_delta, axis=2) conv2_bias_grad = np.sum(conv2_delta, axis=(0,1)) max_pool1_error = self.conv2d(conv2_delta, np.ones((2,2)), np.zeros((1,))) conv1_delta = max_pool1_error * (conv1_output_relu > 0) conv1_weights_grad = np.zeros_like(self.conv1_weights) for i in range(self.conv1_filters): conv1_weights_grad[i] = np.sum(x[:, :, i:i+1] * conv1_delta, axis=2) conv1_bias_grad = np.sum(conv1_delta, axis=(0,1)) # 权重更新 self.conv1_weights -= learning_rate * conv1_weights_grad self.conv1_bias -= learning_rate * conv1_bias_grad self.conv2_weights -= learning_rate * conv2_weights_grad self.conv2_bias -= learning_rate * conv2_bias_grad self.fc1_weights -= learning_rate * fc1_weights_grad self.fc1_bias -= learning_rate * fc1_bias_grad self.output_weights -= learning_rate * np.outer(fc1_output_relu, output_delta) self.output_bias -= learning_rate * output_delta def predict(self, x): y_pred = np.zeros((len(x), 10)) for i in range(len(x)): y_pred[i] = self.forward(x[i]) return y_pred # 训练模型 model = ConvolutionalNeuralNetwork() batch_size = 32 learning_rate = 0.1 num_epochs = 5 num_batches = len(train_images) // batch_size for epoch in range(num_epochs): for batch in range(num_batches): batch_start = batch * batch_size batch_end = (batch + 1) * batch_size x_batch = train_images[batch_start:batch_end] y_batch = train_labels[batch_start:batch_end] model.train(x_batch, y_batch, learning_rate) y_pred = model.predict(train_images) accuracy = np.mean(np.argmax(y_pred, axis=1) == np.argmax(train_labels, axis=1)) print("Epoch: {}, Accuracy: {:.3f}".format(epoch+1, accuracy)) # 测试模型 y_pred = model.predict(test_images) accuracy = np.mean(np.argmax(y_pred, axis=1) == np.argmax(test_labels, axis=1)) print("Test Accuracy: {:.3f}".format(accuracy)) ``` 这个代码实现了一个包含2个卷积层、2个池化层和1个全连接层的卷积神经网络模型,并使用MNIST数据集进行训练和测试。你可以根据自己的需求修改模型结构和训练参数。
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值