ResNet网络是基于残差块的网络模型,先介绍下什么是残差块(Residual block):
红色圈画的即为一个残差块,与普通的网络相比,区别就是求a[l+2]的时候多加了前面的a[l]这个残差
残差网络:
左图为普通的网络,右图为ResNet。理论上随着网络层数的增加,损失会不断下降,实际上却是左图,因为随着网络的增加,优化函数会越来越难优化
为什么深层网络损失会上升
A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow. More specifically, during gradient descent, as you backprop from the final layer back to the first layer, you are multiplying by the weight matrix on each step, and thus the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and "explode" to take very large values).
遗留问题? 现在网络都是用Relu函数,还是导致梯度消失?
为什么残差网络会有用?
随着网络层数增多,在有正则化的损失函数情况下,网络变的复杂,那就会‘压榨’参数W,使得W趋向于0,故绿色叉号为0,这样网络加上这2层与不加效果是一样的,但若绿色那里不为0,网络就会学习到有用的信息,这样识别率就提高了
当a[l]与a[l+2]维度不同时,增加一个自学习的矩阵Wt*a[l]就可以解决了
Resnet整个网络结构: