keras 关于 dropout 的一点讨论

dropout in training and testing #5357


@wenouyang

In this link, devinplatt gives the following way to include dropout in training,

model = Sequential()
model.add(Dropout(0.5, input_shape=(20,)))
model.add(Dense(64, init='uniform'))

In this post, author mentioned that “Finally, if the training has finished, you’d use the complete network for testing (or in other words, you set the dropout probability to 0).”

In terms of keras implementation, does that mean, we have to modify the line model.add(Dropout(0.5, input_shape=(20,))) after we loading the training weight.

@unrealwill

Hello,
By looking at the source code :
https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L111
x = K.in_train_phase(dropped_inputs, lambda: x)

You can see that dropout is only applied in train phase.

@radekosmulski

That is correct - dropout should be applied during training (drop inputs with probability p) but there also needs to be a corresponding component of scaling the weights at test time as outlined in the referenced paper

I guess this is not happening at the moment, at least the results I got thus far might indicate that there is an issue here. Will investigate this further and see if I can provide an example.

@unrealwill
 

unrealwill commented on 24 Feb 2017  

Hello, @radekosmulski
This is not a problem. See issue fchollet#3305.
Keras use inverse scaling during training (so that remaining weights are increased during training).
See :

def dropped_inputs():
  return K.dropout(x, self.p, noise_shape, seed=self.seed)

https://github.com/fchollet/keras/blob/master/keras/layers/core.py#L110

@radekosmulski
 

radekosmulski commented on 24 Feb 2017  

Thank you for your reply @unrealwill. I am new to keras so sorry if I misunderstand something. I still feel there is something unusual when running model.predict or model.evaluate when using dropout. Please see below:

import keras
import numpy as np

X = np.array(
    [[2, 1],
     [4, 2]])
y = np.array(
    [[5],
     [10]]
)

# Works as expected without dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
model.evaluate(X, y) # => ~0

# With dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.add(keras.layers.Dropout(0.5))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
model.evaluate(X, y) # => converges to MSE of 15.625

model.predict(X) # => array([[ 2.5],
                 #          [ 5. ]], dtype=float32)

The MSE this converges to is due to the outputs being exactly half of what they should be (2.5^2+5^2)/2 = 15.625

@unrealwill
 

unrealwill commented on 24 Feb 2017  

@radekosmulski
The Dropout noise introduce bias as it is a non symmetric noise.
Dropout shouldn't be added as a last layer (which we normally don't do).
Because "mse" is convex, Jensen inequality applies and you are training to learn the bias of the noise.

The bias of the dropout can be subsequently removed by using a dense layer after the first layer (=>average result = 7.5 ).
And if you had more hidden cells (100) you average the noise out, and get what you want.

import keras
import numpy as np

X = np.array(
    [[2, 1],
     [4, 2]])
y = np.array(
    [[5],
     [10]]
)

# Works as expected without dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=1))
model.compile(keras.optimizers.SGD(), loss='MSE')
model.fit(X, y, nb_epoch=10000, verbose=0)
print model.evaluate(X, y) # => ~0

# With dropout
model = keras.models.Sequential()
model.add(keras.layers.Dense(input_dim=2, output_dim=100))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(1))
model.compile(keras.optimizers.adam(), loss='MSE')
model.fit(X, y, nb_epoch=100000, verbose=0)
print model.evaluate(X, y) # => converges to MSE of 15.625

print model.predict(X) # => array([[ 4.91],
                 #          [ 9.96 ]], dtype=float32)
@radekosmulski

@unrealwill thank you very much for taking the time to reply, I really appreciate it. I understand now.

@stale stale bot added the stale label on 26 May 2017

@stale
 

stale bot commented on 26 May 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot closed this on 25 Jun 2017

@spearsem
 

spearsem commented on 8 Dec 2017

@unrealwill There is another use case of dropout at testing or inference time: in order to get a notion of uncertainty and variability in the prediction of the network model, you might take a given input and run predict on it many times, each with different randomly assigned dropout neurons.

Say you run predict 100 times for a single test input. The average of these will approximate what you get with no dropout, the 'expected value' over different weight schemes. And various metrics like the standard deviation of these results will give you a sense of the error bounds of your estimate (conditioned on assumptions about the validity of the underlying model structure).

In this sense, it would be very useful to have to ability to re-activate Dropout settings from training, but specifically during testing or regular inference.































































































Dropout的意义

在学习时,随即的去掉某些特征,以此避免过拟合。

Dropout层源代码

dropout层在layer下的core.py中

class Dropout(Layer):
    '''Applies Dropout to the input. Dropout consists in randomly setting
    a fraction `p` of input units to 0 at each update during training time,
    which helps prevent overfitting.

    # Arguments
        p: float between 0 and 1. Fraction of the input units to drop.

    # References
        - [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)
    '''
    def __init__(self, p, **kwargs):
        self.p = p
        if 0. < self.p < 1.:
            self.uses_learning_phase = True
        self.supports_masking = True
        super(Dropout, self).__init__(**kwargs)

    def call(self, x, mask=None):
        if 0. < self.p < 1.:
            x = K.in_train_phase(K.dropout(x, level=self.p), x)
        return x

    def get_config(self):
        config = {'p': self.p}
        base_config = super(Dropout, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

分析

功能实现

其中的call方法调用了K.dropout来处理具体的事情。 
根据前面的import得知,这个K就是backend。 keras的backend有两个,一个是theano,一个是tensorflow

theano的dropout

在backend的theano_backend.py中

def dropout(x, level, seed=None):
    if level < 0. or level >= 1:
        raise Exception('Dropout level must be in interval [0, 1[.')
    if seed is None:
        seed = np.random.randint(10e6)
    rng = RandomStreams(seed=seed)
    retain_prob = 1. - level
    x *= rng.binomial(x.shape, p=retain_prob, dtype=x.dtype)
    x /= retain_prob
    return x
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

这个函数里使用了固定的随机数种子(10e6), 用户可以通过seed函数指定。 
随即使用二项式分布的随机数种子生成了一串随机数。这些随机数是0或者1 
他们服从二项分布。 乘以x后,x若干项被置为0. 然后将x除以(1-lever),使得所有的x保持加为1

这样就实现了Dropout

in_train_phase

这个函数的作用就是说, 当我训练的时候我采用dropout,当我测试的时候,我不采用。他里面调用了一个switch语句

def switch(condition, then_expression, else_expression):
    '''condition: scalar tensor.
    '''
    return T.switch(condition, then_expression, else_expression)


def in_train_phase(x, alt):
    x = T.switch(_LEARNING_PHASE, x, alt)
    x._uses_learning_phase = True
    return x
转处:https://blog.csdn.net/taiji1985/article/details/51251628
阅读更多

没有更多推荐了,返回首页