ResNet学习

本文结构:

我的阅读笔记

  1.ResNet之Building block

  2.ResNet之CIFAR-10实验结构

其他资料

  1.ResNet作者何凯明博士在ICML2016上的tutorial演讲

  2.Bottleneck

  3.diss ResNet的论文

论文翻译

 

---------------------------------------------------------------------------------------------------------------------

我的阅读笔记

1.ResNet之Building block

以下内容为我的理解,如有不正确的地方,还望各位大神指导!

如图所示为截选自ResNet-34的部分Buildingblock,论文作者对于identiy shortcut和projection shortcut的两种options的描述,让我困惑了很久,一直在纠结到底为什么经过了3×3的卷积层以后,仍然能够保持输入输出的一致?

以下是我对这个问题的理解:

首先,为了方便,我将buildingblock区分为两类:

a.      第一类Building block(BB1)如上图中实线部分的building block所示,特点为输入输出的维度一致,特征图个数也一致;

b.      第二类Building block(BB2)如上图虚线部分的building block所示,特点为输出是输入维度的1/2,输出特征图个数是输入特征图个数的2倍(即执行了/2操作)。

区分了两类Building block后,来来来跟我一起仔细读一下论文:

1. “The identity shortcuts (Eqn.(1)) can be directly used when theinput and output are of the same dimensions (solid line shortcuts in Fig. 3).”


What?对于BB1,让我直接相加?输入都经过两次3×3的卷积操作了啊喂,维度不一样怎么相加!好吧,经过查阅资料,作者可能委婉的表达了中间过程,但是我没有发现吧。我琢磨着中间过程应该如下所示:

 

好啦,这下可以“can bedirectly used”了。接着读论文:

2. “When the dimensions increase (dotted line shortcuts in Fig. 3),we consider two options: (A) The shortcut still performs identity mapping, withextra zero entries padded for increasing dimensions. This option introduces noextra parameter; (B) The projection shortcut in Eqn.(2) is used to matchdimensions (done by 1×1 convolutions). For both options, when the shortcuts goacross feature maps of two sizes, they are performed with a stride of 2.”

 

对于BB2,作者提供了两种选择:(A)如BB1的处理一样,0填充技术,只是要填充好多0啊,这也是为什么得到实验4.1中的ResidualNetworks部分的“B is slightly better than A. We argue that this is because thezero-padded dimensions in A indeed have no residual learning.”的结论(P6右侧中间)。(B)采用公式(2)的projectionshortcut,让Ws与输入做步长为2的1×1的卷积操作,这样,输入和输出就具有相同的维数,接下来在进行相加操作就OK啦!过程如下图所示:

 

2.ResNet之CIFAR-10实验结构

哎呀,对于我的理解能力来说,作者对基于ResNet的CIFAR-10的实验网络结构描述的太混乱了!好不容易才搞清楚的。以n=3,20层的ResNet为例,具体结构如下表所示:

Output map size

Output_size

20-layer ResNet

Conv1

32×32

{3×3,16}

Conv2_x

32×32

{3×3,16; 3×3,16}×3

Conv3_x

16×16

{3×3,32; 3×3,32}×3

Conv4_x

8×8

{3×3,64; 3×3,64}×3

InnerProduct

1×1

Average pooling 10-d fc

 

其他资料

1.ResNet作者何凯明博士在ICML2016上的tutorial演讲

以下是一个三输入三输出的ResNet实例的Python代码: ```python from keras.layers import Input, Conv2D, BatchNormalization, Activation, Add from keras.models import Model def res_block(inputs, filters, kernel_size=3, strides=1, padding='same'): x = Conv2D(filters, kernel_size, strides=strides, padding=padding)(inputs) x = BatchNormalization()(x) x = Activation('relu')(x) x = Conv2D(filters, kernel_size, strides=1, padding=padding)(x) x = BatchNormalization()(x) x = Add()([x, inputs]) x = Activation('relu')(x) return x input_shape = (224, 224, 3) input1 = Input(input_shape) input2 = Input(input_shape) input3 = Input(input_shape) x = Conv2D(64, 7, strides=2, padding='same')(input1) x = BatchNormalization()(x) x = Activation('relu')(x) x = res_block(x, 64) x = res_block(x, 64) x = Conv2D(128, 3, strides=2, padding='same')(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = res_block(x, 128) x = res_block(x, 128) x = Conv2D(256, 3, strides=2, padding='same')(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = res_block(x, 256) x = res_block(x, 256) x = Conv2D(512, 3, strides=2, padding='same')(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = res_block(x, 512) x = res_block(x, 512) x1 = Conv2D(1024, 3, strides=2, padding='same')(x) x1 = BatchNormalization()(x1) x1 = Activation('relu')(x1) x1 = res_block(x1, 1024) x1 = res_block(x1, 1024) x1 = Conv2D(2048, 3, strides=2, padding='same')(x1) x1 = BatchNormalization()(x1) x1 = Activation('relu')(x1) x1 = res_block(x1, 2048) x1 = res_block(x1, 2048) output1 = x1 x2 = Conv2D(1024, 3, strides=2, padding='same')(x) x2 = BatchNormalization()(x2) x2 = Activation('relu')(x2) x2 = res_block(x2, 1024) x2 = res_block(x2, 1024) x2 = Conv2D(2048, 3, strides=2, padding='same')(x2) x2 = BatchNormalization()(x2) x2 = Activation('relu')(x2) x2 = res_block(x2, 2048) x2 = res_block(x2, 2048) output2 = x2 x3 = Conv2D(1024, 3, strides=2, padding='same')(x) x3 = BatchNormalization()(x3) x3 = Activation('relu')(x3) x3 = res_block(x3, 1024) x3 = res_block(x3, 1024) x3 = Conv2D(2048, 3, strides=2, padding='same')(x3) x3 = BatchNormalization()(x3) x3 = Activation('relu')(x3) x3 = res_block(x3, 2048) x3 = res_block(x3, 2048) output3 = x3 model = Model(inputs=[input1, input2, input3], outputs=[output1, output2, output3]) ``` 这个实例定义了一个包含三个输入和三个输出的ResNet模型,每个输入都是一个形状为(224, 224, 3)的图像,分别被传入不同的ResNet分支进行处理。每个分支都包含多个Residual Block,最终输出一个2048维的向量作为该分支的输出。最终输出三个2048维的向量,分别对应三个输入的处理结果。
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值