Convolutional Neural Network--学习与实践

我们来用python实现一个LeNet识别手写数字的例子。

详细的代码可以在github上找到: https://github.com/songjun54cm/MachineLearningPy

LeNet的模型结构如下图所示:

_images/mylenet.png

整个模型涉及到这样几个重要的Layer:

Convolutional Layer、Pooling Layer、Fullconnect Layer

Convolutional Layer:

输入是一个batch的data,是一个四维(batch_size, channel_num, height, width)的数组。

权重是一组convolutional filter,是一个四维(input_channel_num, output_feature_map_num, filter_height, filter_width)的数组。

输出是一个这个batch的卷积结果,是一个四维(batch_size, output_feature_map_num, output_height, output_width)

前向传播:

    输入batch data,输出卷积的结果。

    数学上,卷积的定义是a[i]*b[n-i]这种倒过来相乘的格式,因此在调用一些现成的数学运算库的时候主要要将卷积核旋转180度,然后调用卷积函数。前向传播的具体过程见代码如下:

    def fprop(self, input_data):
        #TODO strides unequals to 1
        self.last_input = input_data
        
        # print self.output_shape, input_data.shape, self.W.shape
        convout = np.zeros(self.get_output_shape(input_data.shape))
        for n in range(convout.shape[0]):
            for f in range(convout.shape[1]):
                for c in range(input_data.shape[1]):
                    try:
                        convout[n, f, :, :] += signal.convolve2d(input_data[n,c,:,:], 
                                                                np.rot90(np.rot90(self.W[c,f,:,:])), 
                                                                mode=self.padding_mode)
                    except:
                        print 'error'
        return convout + self.b[np.newaxis, :, np.newaxis, np.newaxis]

反向传播:

    反向传播的时候需要计算两个东西,一个是filter的梯度,另一个是input data的梯度,用于传递给接下来的层。

    计算filter的梯度很简单,就是用outpu_grad(loss对本层输出求的导数)对input_data正向得做卷积(需要旋转180度)。

    计算input data 的梯度也很简单,就是使用filter对output grad做反向的卷积(不旋转180度)。

具体计算的代码如下:

    def bprop(self, output_grad):
        #TODO padding_mode 'full'
        if self.padding_mode == 'valid':
            input_bp_mode = 'full'
            param_bp_mode = 'valid'
            padding_input_data = self.last_input
        elif self.padding_mode == 'same':
            input_bp_mode = 'same'
            param_bp_mode = 'valid'
            padding_size = self.W.shape[2]//2
            padding_input_data = np.zeros((self.last_input.shape[0], 
                                            self.last_input.shape[1],
                                            self.last_input.shape[2]+self.W.shape[2],
                                            self.last_input.shape[3]+self.W.shape[3]))
            padding_input_data[:,:,
                                padding_size:self.last_input.shape[2]+padding_size,
                                padding_size:self.last_input.shape[3]+padding_size] = self.last_input
        input_grad = np.zeros(self.last_input.shape)
        self.dW = np.zeros(self.W.shape)
        for n in range(output_grad.shape[0]):
            for f in range(output_grad.shape[1]):
                for c in range(self.last_input.shape[1]):
                    input_grad[n, c, :, :] += signal.convolve2d(output_grad[n,f,:,:], 
                                                                self.W[c,f,:,:], 
                                                                mode=input_bp_mode)
                    self.dW[c,f,:,:] += signal.convolve2d(padding_input_data[n,c,:,:], 
                                                        np.rot90(np.rot90(output_grad[n,f,:,:])), 
                                                        mode=param_bp_mode)
        self.db = np.sum(output_grad, axis=(0,2,3))
        self.dW -= self.weight_decay * self.W
        return input_grad

PoolingLayer:

输入是上一层的输出,输出是经过降采样得到的结果。这里我们采用maxpooling的方法。使用一个(out_width,out_hight,2)的三维矩阵来保存每次pooling window结果对应的坐标位置。具体前向与反向传播结果过程如下:

前向传播:maxpooling当中每次选取pooling window做大的值。

    def fprop(self, input_data):
        self.last_data = input_data
        self.last_switches = np.empty(self.get_output_shape(input_data.shape)+(2,), dtype=np.int)
        pool_out = np.zeros(self.get_output_shape(input_data.shape))

        pool_h_top = self.pool_h//2 - 1 + self.pool_h % 2 # if the hight of pool window is even, the center is near the left-top
        pool_h_bottom = self.pool_h//2+1
        pool_w_left = self.pool_w//2 - 1 + self.pool_w % 2 # if the width of pool window is even, the center is near the left-top
        pool_w_right = self.pool_w//2  + 1

        for n in range(pool_out.shape[0]):
            for f in range(pool_out.shape[1]):
                for y_out in range(pool_out.shape[2]):
                    y = y_out * self.stride_y
                    y_min = max(y-pool_h_top, 0)
                    y_max = min(y+pool_h_bottom, input_data.shape[2])
                    for x_out in range(pool_out.shape[3]):
                        x = x_out * self.stride_x
                        x_min = max(x-pool_w_left, 0)
                        x_max = min(x+pool_w_right, input_data.shape[3])
                        region = input_data[n,f,y_min:y_max, x_min:x_max]
                        if self.mode=='max':
                            max_0, argmax_0 = region.max(0), region.argmax(0)
                            max_1, argmax_1 = max_0.max(), max_0.argmax()
                            maxVal = max_1
                            max_pos_y, max_pos_x = argmax_0[argmax_1], argmax_1
                            pool_out[n,f,y_out,x_out] = maxVal
                            self.last_switches[n,f,y_out,x_out,0] = max_pos_y + y_min
                            self.last_switches[n,f,y_out,x_out,1] = max_pos_x + x_min
                            # print max_pos_y, max_pos_x
                        else:
                            raise ValueError('Error Pooling Mode')
        return pool_out

反向传播:因为没有参数,反向传播过程只要计算对input data求导的结果即可。

    def bprop(self, output_grad):
        input_grad = np.zeros(self.last_data.shape)
        for n in range(output_grad.shape[0]):
            for f in range(output_grad.shape[1]):
                for y_out in range(output_grad.shape[2]):
                    for x_out in range(output_grad.shape[3]):
                        input_grad[n,f,self.last_switches[n,f,y_out,x_out,0],self.last_switches[n,f,y_out,x_out,1]] \
                            = output_grad[n,f,y_out,x_out]

        return input_grad

FullconnectLayer:

全连接层做的运算是一个简答的矩阵运算。全连接层的输入是一个nxd的二维向量,表示这个batch当中的n个sample,每个sample是d维的向量。

全连层的参数是一个dxp的矩阵,表示输入是nxd维的特征矩阵,输出是nxp维的结果矩阵。

前向传播:具体过程如下:

    def fprop(self, input_data):
        self.last_input = input_data
        return np.dot(input_data, self.W) + self.b

反向传播:具体过程如下:

    def bprop(self, output_grad):
        # n = output_grad.shape[0]
        self.dW = np.dot(self.last_input.T, output_grad) - self.weight_decay*self.W
        self.db = np.sum(output_grad, axis=0)
        return np.dot(output_grad, self.W.T)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
LazyProgrammer, "Convolutional Neural Networks in Python: Master Data Science and Machine Learning with Modern Deep Learning in Python, Theano, and TensorFlow" 2016 | ASIN: B01FQDREOK | 52 pages | EPUB | 1 MB This is the 3rd part in my Data Science and Machine Learning series on Deep Learning in Python. At this point, you already know a lot about neural networks and deep learning, including not just the basics like backpropagation, but how to improve it using modern techniques like momentum and adaptive learning rates. You've already written deep neural networks in Theano and TensorFlow, and you know how to run code using the GPU. This book is all about how to use deep learning for computer vision using convolutional neural networks. These are the state of the art when it comes to image classification and they beat vanilla deep networks at tasks like MNIST. In this course we are going to up the ante and look at the StreetView House Number (SVHN) dataset - which uses larger color images at various angles - so things are going to get tougher both computationally and in terms of the difficulty of the classification task. But we will show that convolutional neural networks, or CNNs, are capable of handling the challenge! Because convolution is such a central part of this type of neural network, we are going to go in-depth on this topic. It has more applications than you might imagine, such as modeling artificial organs like the pancreas and the heart. I'm going to show you how to build convolutional filters that can be applied to audio, like the echo effect, and I'm going to show you how to build filters for image effects, like the Gaussian blur and edge detection. After describing the architecture of a convolutional neural network, we will jump straight into code, and I will show you how to extend the deep neural networks we built last time with just a few new functions to turn them into CNNs. We will then test their performance and show how convolutional neu

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值