Convolutional Neural Network--学习与实践

最新推荐文章于 2024-05-13 15:40:55 发布

白云苍驹

最新推荐文章于 2024-05-13 15:40:55 发布

阅读量2k

点赞数

分类专栏： Deep Learning python

本文链接：https://blog.csdn.net/Sloudy/article/details/45956037

版权

python 同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

Deep Learning

3 篇文章 0 订阅

订阅专栏

我们来用python实现一个LeNet识别手写数字的例子。

详细的代码可以在github上找到： https://github.com/songjun54cm/MachineLearningPy

LeNet的模型结构如下图所示：

_images/mylenet.png

整个模型涉及到这样几个重要的Layer：

Convolutional Layer、Pooling Layer、Fullconnect Layer

Convolutional Layer：

输入是一个batch的data，是一个四维（batch_size, channel_num, height, width）的数组。

权重是一组convolutional filter，是一个四维（input_channel_num, output_feature_map_num, filter_height, filter_width）的数组。

输出是一个这个batch的卷积结果，是一个四维（batch_size, output_feature_map_num, output_height, output_width）

前向传播：

输入batch data，输出卷积的结果。

数学上，卷积的定义是a[i]*b[n-i]这种倒过来相乘的格式，因此在调用一些现成的数学运算库的时候主要要将卷积核旋转180度，然后调用卷积函数。前向传播的具体过程见代码如下：

    def fprop(self, input_data):
        #TODO strides unequals to 1
        self.last_input = input_data
        
        # print self.output_shape, input_data.shape, self.W.shape
        convout = np.zeros(self.get_output_shape(input_data.shape))
        for n in range(convout.shape[0]):
            for f in range(convout.shape[1]):
                for c in range(input_data.shape[1]):
                    try:
                        convout[n, f, :, :] += signal.convolve2d(input_data[n,c,:,:], 
                                                                np.rot90(np.rot90(self.W[c,f,:,:])), 
                                                                mode=self.padding_mode)
                    except:
                        print 'error'
        return convout + self.b[np.newaxis, :, np.newaxis, np.newaxis]

反向传播：

反向传播的时候需要计算两个东西，一个是filter的梯度，另一个是input data的梯度，用于传递给接下来的层。

计算filter的梯度很简单，就是用outpu_grad（loss对本层输出求的导数）对input_data正向得做卷积（需要旋转180度）。

计算input data 的梯度也很简单，就是使用filter对output grad做反向的卷积（不旋转180度）。

具体计算的代码如下：

    def bprop(self, output_grad):
        #TODO padding_mode 'full'
        if self.padding_mode == 'valid':
            input_bp_mode = 'full'
            param_bp_mode = 'valid'
            padding_input_data = self.last_input
        elif self.padding_mode == 'same':
            input_bp_mode = 'same'
            param_bp_mode = 'valid'
            padding_size = self.W.shape[2]//2
            padding_input_data = np.zeros((self.last_input.shape[0], 
                                            self.last_input.shape[1],
                                            self.last_input.shape[2]+self.W.shape[2],
                                            self.last_input.shape[3]+self.W.shape[3]))
            padding_input_data[:,:,
                                padding_size:self.last_input.shape[2]+padding_size,
                                padding_size:self.last_input.shape[3]+padding_size] = self.last_input
        input_grad = np.zeros(self.last_input.shape)
        self.dW = np.zeros(self.W.shape)
        for n in range(output_grad.shape[0]):
            for f in range(output_grad.shape[1]):
                for c in range(self.last_input.shape[1]):
                    input_grad[n, c, :, :] += signal.convolve2d(output_grad[n,f,:,:], 
                                                                self.W[c,f,:,:], 
                                                                mode=input_bp_mode)
                    self.dW[c,f,:,:] += signal.convolve2d(padding_input_data[n,c,:,:], 
                                                        np.rot90(np.rot90(output_grad[n,f,:,:])), 
                                                        mode=param_bp_mode)
        self.db = np.sum(output_grad, axis=(0,2,3))
        self.dW -= self.weight_decay * self.W
        return input_grad

PoolingLayer：

输入是上一层的输出，输出是经过降采样得到的结果。这里我们采用maxpooling的方法。使用一个（out_width，out_hight，2）的三维矩阵来保存每次pooling window结果对应的坐标位置。具体前向与反向传播结果过程如下：

前向传播：maxpooling当中每次选取pooling window做大的值。

    def fprop(self, input_data):
        self.last_data = input_data
        self.last_switches = np.empty(self.get_output_shape(input_data.shape)+(2,), dtype=np.int)
        pool_out = np.zeros(self.get_output_shape(input_data.shape))

        pool_h_top = self.pool_h//2 - 1 + self.pool_h % 2 # if the hight of pool window is even, the center is near the left-top
        pool_h_bottom = self.pool_h//2+1
        pool_w_left = self.pool_w//2 - 1 + self.pool_w % 2 # if the width of pool window is even, the center is near the left-top
        pool_w_right = self.pool_w//2  + 1

        for n in range(pool_out.shape[0]):
            for f in range(pool_out.shape[1]):
                for y_out in range(pool_out.shape[2]):
                    y = y_out * self.stride_y
                    y_min = max(y-pool_h_top, 0)
                    y_max = min(y+pool_h_bottom, input_data.shape[2])
                    for x_out in range(pool_out.shape[3]):
                        x = x_out * self.stride_x
                        x_min = max(x-pool_w_left, 0)
                        x_max = min(x+pool_w_right, input_data.shape[3])
                        region = input_data[n,f,y_min:y_max, x_min:x_max]
                        if self.mode=='max':
                            max_0, argmax_0 = region.max(0), region.argmax(0)
                            max_1, argmax_1 = max_0.max(), max_0.argmax()
                            maxVal = max_1
                            max_pos_y, max_pos_x = argmax_0[argmax_1], argmax_1
                            pool_out[n,f,y_out,x_out] = maxVal
                            self.last_switches[n,f,y_out,x_out,0] = max_pos_y + y_min
                            self.last_switches[n,f,y_out,x_out,1] = max_pos_x + x_min
                            # print max_pos_y, max_pos_x
                        else:
                            raise ValueError('Error Pooling Mode')
        return pool_out

反向传播：因为没有参数，反向传播过程只要计算对input data求导的结果即可。

    def bprop(self, output_grad):
        input_grad = np.zeros(self.last_data.shape)
        for n in range(output_grad.shape[0]):
            for f in range(output_grad.shape[1]):
                for y_out in range(output_grad.shape[2]):
                    for x_out in range(output_grad.shape[3]):
                        input_grad[n,f,self.last_switches[n,f,y_out,x_out,0],self.last_switches[n,f,y_out,x_out,1]] \
                            = output_grad[n,f,y_out,x_out]

        return input_grad

FullconnectLayer：

全连接层做的运算是一个简答的矩阵运算。全连接层的输入是一个nxd的二维向量，表示这个batch当中的n个sample，每个sample是d维的向量。

全连层的参数是一个dxp的矩阵，表示输入是nxd维的特征矩阵，输出是nxp维的结果矩阵。

前向传播：具体过程如下：
    def fprop(self, input_data):
        self.last_input = input_data
        return np.dot(input_data, self.W) + self.b
反向传播：具体过程如下：
    def bprop(self, output_grad):
        # n = output_grad.shape[0]
        self.dW = np.dot(self.last_input.T, output_grad) - self.weight_decay*self.W
        self.db = np.sum(output_grad, axis=0)
        return np.dot(output_grad, self.W.T)

白云苍驹

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Convolutional Neural Network--学习与实践

我们来用python实现一个LeNet识别手写数字的例子。 LeNet的模型结构如下图所示：整个模型涉及到这样几个Layer： Convolutional Layer、Pooling Layer、InnerProduct Layer、Activation Layer、Accuracy Layer、Softmax Layer Convolutional Layer：输入是一个
复制链接

扫一扫

专栏目录