Inception v4 & Inception-ResNet:https://arxiv.org/abs/1602.07261
keras 代码:unofficial-keras : https://github.com/titu1994/Inception-v4
Inception v4 & Inception-ResNet v1、v2
0. 前言
主要受ResNet 网络的启发,他们基于inception v3的基础上,引入了残差结构,提出了inception-resnet-v1和inception-resnet-v2,并修改inception模块提出了inception v4结构。基于inception v4的网络实验发现在不引入残差结构的基础上也能达到和inception-resnet-v2结构相似的结果。
google认为他们之前在改变架构选择上相对保守:网络结构的改变只局限于独立的网络组件范围内,从而保持剩下模型稳定。而现在,他们决定抛弃之前那个设计原则,对不同尺度的网格都采用统一的inception模块。在下面的网络结构图中:所有后面不带V的卷积,用的都是same-padded,也就是输出的网格大小等于输入网格的大小(如vgg的卷积一样);带V的使用的是valid-padded,表示输出的网格尺寸是会逐步减小的。
Inception-ResNet-V1的结果与Inception v3相当;Inception-ResNet-V2与Inception v4结果差不多,不过实际过程中Inception v4会明显慢于Inception-ResNet-v2,这也许是因为层数太多了。且在Inception-ResNet结构中,只在传统层的上面使用BN层,而不在合并层上使用BN.
1. Inception v4
图9为Inception v4的整体框架,图9中的各个模块按照顺序,其分框架分别为图3、4、7、5、8、6。
2. Inception-ResNet v1 & v2
2.1 Inception-ResNet v1 &v2网络结构流程图
2.2 Inception-ResNet v1网络模块
按照图15的顺序,各个模块分别为图14、10、7、11、12、13。
2.3 Inception-ResNet v2网络模块
按照图15的顺序,各个模块分别为图3、16、7、17、18、19。
2.4 网络的其他细节
- 每一个inception模块中都有一个1×1带有线性激活的卷积层,用来扩展通道数,从而补偿因为inception模块导致的维度相减。
- 在Inception-ResNet结构中,只在传统层的上面使用BN层,而不在合并层上使用BN。
- Inception v4、Inception-ResNet v1、v2采用相同的Reduction A结构,只是参数不同,如表1。
- Inception-ResNet v1和v2网络的结构都是相同的,只是在滤波器个数上有差别。
- 如果通道数超过1000,那么Inception-resnet等网络都会开始变得不稳定,并且过早的就“死掉了”,即在迭代几万次之后,平均池化的前面一层就会生成很多的0值。作者的解决办法是在将残差汇入之前,对残差进行缩小,可以让模型稳定训练,值通常选择 [0,1.0.3],如Figure 20。
- 在ResNet-v1中,何凯明等人也在cifar-10中发现了模型的不稳定现象:即在特别深的网络基础上去训cifar-10,需要先以0.01的学习率去训练,然后在以0.1的学习率训练。不过这里的作者们认为如果通道数特别多的话,即使以特别低的学习率(0.00001)训练也无法让模型收敛,如果之后再用大学习率,那么就会轻松的破坏掉之前的成果。然而简单的缩小残差的输出值有助于学习的稳定,即使进行了简单的缩小,那么对最终结果也造成不了多大的损失,反而有助于稳定训练。
2. 5 实验结论
- 在inception-resnet-v1与inception v3的对比中,inception-resnet-v1虽然训练速度更快,不过最后结果有那么一丢丢的差于inception v3;
- 而在inception-resnet-v2与inception v4的对比中,inception-resnet-v2的训练速度更块,而且结果比inception v4也更好一点。所以最后胜出的就是inception-resnet-v2。
3. Inception-ResNet v2代码
采用tensorflow2.0, tf.keras实现.
"""
Implementation of Inception-Residual Network v1 [Inception Network v4 Paper](http://arxiv.org/pdf/1602.07261v1.pdf) in Keras.
Some additional details:
[1] Each of the A, B and C blocks have a 'scale_residual' parameter.
The scale residual parameter is according to the paper. It is however turned OFF by default.
Simply setting 'scale=True' in the create_inception_resnet_v2() method will add scaling.
[2] There were minor inconsistencies with filter size in both B and C blocks.
In the B blocks: 'ir_conv' nb of filters is given as 1154, however input size is 1152.
This causes inconsistencies in the merge-add mode, therefore the 'ir_conv' filter size
is reduced to 1152 to match input size.
In the C blocks: 'ir_conv' nb of filter is given as 2048, however input size is 2144.
This causes inconsistencies in the merge-add mode, therefore the 'ir_conv' filter size
is increased to 2144 to match input size.
Currently trying to find a proper solution with original nb of filters.
[3] In the stem function, the last Convolutional2D layer has 384 filters instead of the original 256.
This is to correctly match the nb of filters in 'ir_conv' of the next A blocks.
"""
import os
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import Sequential, layers
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
class BasicCon2D(keras.Model):
"""This is basic convolution operation, conv+bn+relu"""
def __init__(self, filter_nums, **kwargs):
super(BasicCon2D, self).__init__()
self.conv = layers.Conv2D(filter_nums, use_bias=False, **kwargs)
self.bn = layers.BatchNormalization()
self.relu = layers.Activation('relu')
def call(self, inputs, training=None):
out = self.conv(inputs)
out = self.bn(out)
out = self.relu(out)
return out
class InceptionStem(keras.Model):
"""This is stem network of Inception-ResNet v2 and the input part"""
def __init__(self):
super(InceptionStem, self).__init__()
self.conv = Sequential([
BasicCon2D(32, kernel_size=(3, 3), strides=2),
BasicCon2D(32, kernel_size=(3, 3)),
BasicCon2D(64, kernel_size=(3, 3), padding='same')
])
self.branch_pool1a = layers.MaxPool2D((3, 3), strides=2