python手写神经网络之Dropout实现

23 篇文章 1 订阅
21 篇文章 1 订阅

这里写三种实现,一种是vanilla,一种是效率更高的写法,还有一种是作为网络层的实现方法。

虽然dropout的参数叫probability,一般指的不是扔的几率,是keep的几率(因为代码更好解释?)。(但是不固定,注意一致性,自恰即可

 

vanilla dropout的前向传播网络示意:

作为对照组,给出了predict不乘以p的结果,随着数据量或者数据维度的增大,可以看到最后总输出前两者是接近的,而不乘以p的结果会偏差很大。

""" Vanilla Dropout: Not recommended implementation (see notes below) """
import numpy as np


p = 0.5  # probability of keeping a unit active. higher = less dropout


def train_step(X):
    """ X contains the data """

    # forward pass for example 3-layer neural network
    H1 = np.maximum(0, np.dot(W1, X) + b1)
    U1 = np.random.rand(*H1.shape) < p  # first dropout mask
    H1 *= U1  # drop!
    H2 = np.maximum(0, np.dot(W2, H1) + b2)
    U2 = np.random.rand(*H2.shape) < p  # second dropout mask
    H2 *= U2  # drop!
    out = np.dot(W3, H2) + b3

    return out

    # backward pass: compute gradients... (not shown)
    # perform parameter update... (not shown)


def predict(X):
    # ensembled forward pass
    H1 = np.maximum(0, np.dot(W1, X) + b1) * p  # NOTE: scale the activations
    H2 = np.maximum(0, np.dot(W2, H1) + b2) * p  # NOTE: scale the activations
    out = np.dot(W3, H2) + b3
    return out

def predict_without_multiply_p(X):#作为对比,展示一下忽略scale的结果
    # ensembled forward pass
    H1 = np.maximum(0, np.dot(W1, X) + b1) # NOTE: scale the activations
    H2 = np.maximum(0, np.dot(W2, H1) + b2) # NOTE: scale the activations
    out = np.dot(W3, H2) + b3
    return out
W1 = np.random.randn(4,3)#W1*X=4,1
b1 = np.random.randn(4,1)
W2 = np.random.randn(4,4)#W2*H1=4,1
b2 = np.random.randn(4,1)
W3 = np.random.randn(4,4)#W2*H1=4,1
b3 = np.random.randn(4,1)
if __name__ == '__main__':

    X = np.random.randn(3,1000000)
    y = train_step(X)
    print('in training phase, average value:',y.shape,y.mean())
    predict_y = predict(X)
    print('in predicting phase, average value:',predict_y.shape,predict_y.mean())
    predict_y = predict_without_multiply_p(X)#奇特,不乘以p,反而变小了????#随机的,每一次都不一样,充分说明网络的随机性,每一层的数值大小和最后输出不成绝对比例
    print('in predicting phase(without multiply p), average value:',predict_y.mean())



 

in training phase, average value: (4, 1000000) 0.5332048355924359
in predicting phase, average value: (4, 1000000) 0.4632303379943579
in predicting phase(without multiply p), average value: 2.0510060393300087

上边这个网络逻辑上是没问题的,问题在哪呢?在于运行时性能,因为乘以p的操作在predict时,这就导致运行时开销加大,速度变慢。既然只是要维持train和predict的scale相同,那么把操作移到train时就好了。

 

inverted dropout的前向传播示意网络结构:

注意事项,既然是除法,要注意一下零除的问题(当然,不太可能设置p=0),p的定义,每个课程或者每个人的说法都可能有差异,这里p是keep的概率(感觉keep可能更接近代码的解读,乘以mask),记得李宏毅的课程p好像是drop,那么mask的设置就应该是反向的(>p),除法也是除以1-p而不是p。只要自恰就行了,怎么叫都没关系

import numpy as np
from dropout_vanilla import train_step as vanilla_train
from dropout_vanilla import predict as vanilla_predict
import math
p = 0.5  # probability of keeping a unit active. higher = less dropout
#这里的p是keep,李宏毅的课程,如果p是drop,当然就是除以1-p了,除了绕一点

def train_step(X):
    """ X contains the data """

    # forward pass for example 3-layer neural network
    H1 = np.maximum(0, np.dot(W1, X) + b1)
    U1 = (np.random.rand(*H1.shape) < p) / p  # first dropout mask
    H1 *= U1  # drop!
    H2 = np.maximum(0, np.dot(W2, H1) + b2)
    U2 = (np.random.rand(*H2.shape) < p) / p  # second dropout mask
    H2 *= U2  # drop!
    out = np.dot(W3, H2) + b3

    return out

    # backward pass: compute gradients... (not shown)
    # perform parameter update... (not shown)


def predict(X):
    # ensembled forward pass
    H1 = np.maximum(0, np.dot(W1, X) + b1) # NOTE: scale the activations
    H2 = np.maximum(0, np.dot(W2, H1) + b2) # NOTE: scale the activations
    out = np.dot(W3, H2) + b3
    return out

if __name__ == '__main__':
    np.random.seed(1)

    W1 = np.random.randn(4,3)#W1*X=4,1
    b1 = np.random.randn(4,1)
    W2 = np.random.randn(4,4)#W2*H1=4,1
    b2 = np.random.randn(4,1)
    W3 = np.random.randn(4,4)#W2*H1=4,1
    b3 = np.random.randn(4,1)
    X = np.random.randn(3,1000000)
    y = train_step(X)
    print(y.shape)
    print('inverted train,average value:',y.mean())
    predict_y = predict(X)
    print('inverted predict,average value:',predict_y.mean())

    #想对比两者的期望,但是想了一下,本来激活之后的数据就是杂乱的,没法比,简单示意,两个从量级上都保持了相对稳定
    y = vanilla_train(X)
    predict_y = vanilla_predict(X)
    print('vanilla train,average value:',y.mean())
    print('vanilla predict,average value:',predict_y.mean())
inverted train,average value: -0.19223531434253763
inverted predict,average value: -0.24786749397781124
vanilla train,average value: 0.0815151269153607
vanilla predict,average value: 0.13525566404177755

当然,上边都是裸露的网络,下面实现一个网络层。

 

dropout网络层实现:

相对于其他网络层(尤其BNhttps://blog.csdn.net/huqinweI987/article/details/103229158),这个可以说是非常简单。

 

class Dropout:
    def __init__(self,dropout_ratio=0.5):#这里的是扔的概率
        self.dropout_ratio = dropout_ratio
        self.mask = None
    def forward(self,x,is_train):
        if is_train:
            self.mask = np.random.rand(*x.shape) > self.dropout_ratio
            return x * self.mask
        else:
            return x * (1 - self.dropout_ratio)
    def backward(self,dout):
        return dout * self.mask

 第一段参考代码就是属于vanilla的那种实现,并且p是drop的概率,等下我优化一下,两个版本都看一下。

class Dropout:
    def __init__(self,keep_probability=0.5):#这里的是保留的概率
        self.keep_probability = keep_probability
        self.mask = None
    def forward(self,x,is_train):
        if is_train:
            self.mask = np.random.rand(*x.shape) < self.keep_probability
            return x * self.mask
        else:
            return x * self.keep_probability
    def backward(self,dout):
        return dout * self.mask

然后是测试性能优化:

class Dropout:
    def __init__(self,keep_probability=0.5):#这里的是保留的概率
        self.keep_probability = keep_probability
        self.mask = None
    def forward(self,x,is_train):
        if is_train:
            self.mask = np.random.rand(*x.shape) < self.keep_probability
            return x * self.mask / self.keep_probability
        else:
            return x
    def backward(self,dout):
        return dout * self.mask

然后就把这个层丢进网络结构就可以了。

        # 生成层
        activation_layer = {'sigmoid': Sigmoid, 'relu': Relu}
        self.layers = OrderedDict()
        for idx in range(1, self.hidden_layer_num+1):
            self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)],
                                                      self.params['b' + str(idx)])
            if self.use_batchnorm:
                self.params['gamma' + str(idx)] = np.ones(hidden_size_list[idx-1])
                self.params['beta' + str(idx)] = np.zeros(hidden_size_list[idx-1])
                self.layers['BatchNorm' + str(idx)] = BatchNormalization(self.params['gamma' + str(idx)], self.params['beta' + str(idx)])
                
            self.layers['Activation_function' + str(idx)] = activation_layer[activation]()
            
            if self.use_dropout:
                self.layers['Dropout' + str(idx)] = Dropout(dropout_ration)

        idx = self.hidden_layer_num + 1
        self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)], self.params['b' + str(idx)])

        self.last_layer = SoftmaxWithLoss()

提到丢进网络结构,因为简化的网络前向传播迭代都是如下形式:

for layer in self.layers:

x = layer.forward(x)

这样输入参数是不方便的,所以要给is_train加一个默认参数True,就可以了,这样只要测试时自己手动加上False就可以了

class Dropout:
    def __init__(self,keep_probability=0.5):#这里的是保留的概率
        self.keep_probability = keep_probability
        self.mask = None
    def forward(self,x,is_train=True):
        if is_train:
            self.mask = np.random.rand(*x.shape) < self.keep_probability
            return x * self.mask / self.keep_probability
        else:
            return x
    def backward(self,dout):
        return dout * self.mask

 

不使用dropout结果:300次基本train set过拟合和test set停滞

 

使用dropout结果(drop概率0.2,即,keep概率0.8):仍然在上涨

3000次

 

 

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,下面是一个简单的 Python 代码实现手写数字识别,使用了卷积神经网络(CNN): 首先,我们需要导入必要的库: ```python import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras import backend as K ``` 然后,我们需要加载 MNIST 数据集: ```python (x_train, y_train), (x_test, y_test) = mnist.load_data() ``` 接着,我们需要预处理数据,将输入数据转换成 CNN 所需要的格式: ```python img_rows, img_cols = 28, 28 if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 y_train = keras.utils.to_categorical(y_train, 10) y_test = keras.utils.to_categorical(y_test, 10) ``` 接着,我们可以定义模型,这里使用了两层卷积层和两层全连接层: ```python model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax')) ``` 接着,我们可以编译模型: ```python model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) ``` 然后,我们可以训练模型: ```python batch_size = 128 epochs = 12 model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) ``` 最后,我们可以评估模型的性能: ```python score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) ``` 完整代码如下: ```python import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D from keras import backend as K # 加载 MNIST 数据集 (x_train, y_train), (x_test, y_test) = mnist.load_data() # 预处理数据 img_rows, img_cols = 28, 28 if K.image_data_format() == 'channels_first': x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols) x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols) else: x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1) x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 y_train = keras.utils.to_categorical(y_train, 10) y_test = keras.utils.to_categorical(y_test, 10) # 定义模型 model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(10, activation='softmax')) # 编译模型 model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) # 训练模型 batch_size = 128 epochs = 12 model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) # 评估模型性能 score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值