Dropout network, DropConnect network

最新推荐文章于 2023-04-05 20:58:05 发布

大眼呆萌君

最新推荐文章于 2023-04-05 20:58:05 发布

阅读量295

点赞数

分类专栏：论文学习

本文链接：https://blog.csdn.net/my_god2008/article/details/105560341

版权

论文学习专栏收录该内容

4 篇文章

订阅专栏

本文介绍了Dropout和DropConnect两种深度学习正则化技术。Dropout通过在训练过程中随机关闭神经元来防止过拟合，而DropConnect则是随机置零权重参数。尽管两者在测试阶段都能进行近似平均，但它们主要适用于全连接层。这些技术受到中心极限定理的支持，但存在局限性，仅适用于特定类型的网络层。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Notations

input $v$
output $r$
weight parameter $\in \mathbb{R}^{d \times m}$
activation function $a$
mask $m$ for vector and $M$ for matrix

Dropout

Randomly set activations of each layer to zero with probability $1 - p$ .
$\circ a(Wv),$
$m_j \sim \text{\small Bernoulli}(p)$ .
As many activation functions have the property that $a (0) = 0)$ , we have
$\circ Wv).$

DropConnect

Randomly set the weight of each layer to zero with probability $1 - p$ .
$\circ Wv),$
$M_{ij} \sim \text{\small Bernoulli}(p)$ .
Each $M_{ij}$ is drawn independently for each example during training.
The memory requirement for $M$ 's grows with the size of each mini-batch, and therefore, the implementation needs to be carefully designed.
overall model $f(x;\theta,M)$ , where $\theta = \{W_g,W,W_s\}$
$\begin{aligned} o=\mathbb{E}_M[f(x;\theta,M)]&=\sum_M p(M) f(x;\theta,M)\\ &=\frac{1}{|M|}\sum_M s(a(M \circ W) v); W_s) \quad \text{if } p = 0.5 \end{aligned}$

inference (test stage)
$\begin{aligned} r&=\frac{1}{|M|} \sum_M a((M \circ W)v))\\ r&\approx \frac{1}{Z} \sum_{z=1}^Z r_z \\ &\approx \frac{1}{Z} \sum_{z=1}^Z a(u_z), \end{aligned}$
where $u_z \sim \mathcal{N}(pWv,p(1-p)(W \circ W)(v \circ v)$ ; $Z$ denotes the number of randoml samples drawn from the Gaussian distribution.
Idea: approximate a sum of weighted Bernoulli random variables by a Gaussian random variable. Partially supported by the central limit theorem.