先贴出论文的链接吧:If resnets are the answer, then what is the question?
从现在的网络发展趋势来看,网络越深,精度会越高,但是传统的堆叠网络却无法不断加深网络层数,原因是梯度消失和梯度爆炸。自从Resnet提出以后,网络越深,精度就越好,这是为什么。
我一直以为是由于Resnet 的 skip-connections解决了梯度消失的问题,在用反向传播算法时,梯度进行更新,层数越深,梯度就会越来越小,从而即使加深层数也无法正常训练下去。
现在才知道,batch normalization是可以解决梯度消失和梯度爆炸的问题,所以Resnet解决了什么问题呢?
论文中这样说道:
Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise. In contrast, the gradients in architectures with skip connections are far more resistant to shattering decaying sublinearly
.
This raises the question: If resnets are the solution, then what is the problem? We identify the shattered gradient problem: a previously unnoticed difficulty with gradients in deep rectifier networks that is orthogonal to vanishing and exploding gradients. The shattering gradients problem is that, as depth increases, gradients in standard feedforward networks increasingly resemble white noise. Resnets dramatically reduce the tendency of gradients to shatter
大致意思就是说图像是具有局部相关性的,因此认为梯度也应该具备相关性,但是在梯度更新期间,梯度的相关性会随着层数的变深呈指数性衰减,导致梯度趋近于白噪声,而skip-connections 可以减缓衰减速度,使相关性和之前比起来变大。
论文中还提出“looks linear” (LL) initialization 来阻止shattering,即使没有
skip-connections。
我觉得论文可以读一下,可以提升对Resnet的理解,因为Resnet是一个很有创新性的网络。
如有错误,欢迎指出。