蒙特卡洛
There ain’t no such thing as a free lunch, at least according to the popular adage. Well, not anymore! Not when it comes to neural networks, that is to say. Read on to see how to improve your network’s performance with an incredibly simple yet clever trick called the Monte Carlo Dropout.
至少根据流行的格言,没有免费的午餐之类的东西。 好吧,不再了! 也就是说,不是说到神经网络。 请继续阅读,以了解如何通过一个极其简单而巧妙的技巧称为Monte Carlo Dropout来改善网络性能。
退出 (Dropout)
The magic trick we are about to introduce only works if your neural network has dropout layers, so let’s kick off with briefly introducing these. Dropout boils down to simply switching-off some neurons at each training step. At each step, a different set of neurons are switched off. Mathematically speaking, each neuron has some probability p of being ignored, called the dropout rate. The dropout rate is typically set to be between 0 (no dropout) and 0.5 (approximately 50% of all neurons will be switched off). The exact value depends on the network type, layer size, and the degree to which the network overfits the training data.
我们将要介绍的魔术技巧只有在您的神经网络具有辍学层时才有效,因此让我们从简要介绍这些开始。 辍学归结为仅在每个训练步骤中关闭一些神经元。 在每个步骤中,将关闭一组不同的神经元。 从数学上讲,每个神经元都有被忽略的概率p ,称为辍学率。 辍学率通常设置为0(无辍学)至0.5(约50%的所有神经元将被关闭)。 确切值取决于网络类型,层大小以及网络对训练数据的过度拟合程度。
But why do this? Dropout is a regularization technique, that is, it helps prevent overfitting. With little data and/or a complex network, the model might memorize the training data and, as a result, work great on the data it has seen during training but deliver terrible results on new, unseen data. This is called overfitting, and dropout seeks to alleviate it.
但是为什么呢? 辍学是一种正则化技术,也就是说,它有助于防止过度拟合。 如果数据很少和/或网络很复杂,该模型可能会记住训练数据,因此,可以很好地处理训练过程中看到的数据,但会在看不见的新数据上产生可怕的结果。 这称为过度拟合,辍学试图缓解这种情况。
How? There are two ways to understand why switching off some parts of the model might be beneficial. First, the information spreads out more evenly across the network. Think about a single neuron somewhere inside the network. There are a couple of other neurons that provide it with inputs. With dropout, each of these input sources can disappear at any time during training. Hence, our neuron cannot rely on one or two inputs only, it has to spread out its weights and pay attention to all inputs. As a result, it becomes less sensitive to input changes which results in the model generalizing better.
怎么样? 有两种方法可以理解为什么关闭模型的某些部分可能是有益的。 首先,信息在网络中分布更均匀。 考虑网络内部某处的单个神经元。 还有其他几个神经元为其提供输入。 通过辍学,这些输入源中的每一个都可以在训练期间随时消失。 因此,我们的神经元不能仅依靠一两个输入,它必须分散权重并注意所有输入。 结果,它对输入的变化变得不那么敏感,从而导致模型更好地泛化。
The other explanation of dropout’s effectiveness is even more important from the point of view of our Monte Carlo trick. Since in every training iteration you randomly sample the neurons to be dropped out in each layer (according to that layer’s dropout rate), a different set of neurons are being dropped out each time. Hence, each time the model’s architecture is slightly different and you can think of the outcome as an averaging ensemble of many different neural networks, each trained on one batch of data only.
从我们的蒙特卡洛技巧的角度来看,辍学有效性的另一种解释更为重要。 由于在每次训练迭代中,您都随机采样要在每一层中退出的神经元(根据该层的退出率),因此每次都会丢弃一组不同的神经元。 因此,每次模型的体系结构稍有不同,您都可以将结果视为许多不同神经网络的平均集合,每个神经网络仅对一批数据进行训练。
A final detail: dropout is only used during training. At inference time, that is when we make predictions with our network, we typically don’t apply any dropout — we want to use all the trained neurons and connections.
最后一个细节:辍学仅在训练期间使用。 在推论时,也就是在我们通过网络进行预测时,我们通常不会应用任何退出方式-我们希望使用所有训练有素的神经元和连接。