CNN第二周 Residual Networks

8 篇文章 0 订阅

第二周

1 identity_block

The Problem of Very Deep Neural Networks

神经网络变得越来越深,而使用深层神经网络的原因就是它能够完成很复杂的功能,深层的神经网络通常来说是没有坏处的,但是有一个特别麻烦的情况就是训练的时候会产生梯度消失。在梯度下降的过程中,当你从最后一层回到第一层的时候,你在每个步骤上乘以权重矩阵,因此梯度值可以迅速的指数式地减少到0(在极少数的情况下会迅速增长,造成梯度爆炸)。

随着迭代次数的增加,学习的速度会下降的非常快。因此需要构建残差网络解决这个问题

Building a Residual Network

在残差网络中使用跳跃连接使得梯度反向传播到更浅的层
在这里插入图片描述

步骤:
第一部分:

  • 第一个Conv2D有F1个过滤器大小为(1x1)步长(1,1),填充为valid,使用0作为随机种子为其初始化
  • 第一个BatchNorm是通道轴的归一化
  • 最后使用ReLU函数

第二部分:

  • 第二个Conv2D有F2个过滤器大小为(fxf),步长(1,1),填充为same,使用0为初始化
  • 第二个BatchNorm
  • 使用ReLU

第三部分:

  • 第三个CONV2D有F3个过滤器,其大小为(1,1),步长为(1,1),使用填充方式为“valid”,命名规则为conv_name_base+‘2c’,使用0作为随机种子为其初始化。
  • 第三个BatchNorm是通道的轴归一化
  • 没有ReLU函数

最后一步:

  • 将捷径与输入加在一起
  • 使用ReLU激活函数,它没有命名也没有超参数
Exercise 1 - identity_block
np.random.seed(1)
# 首先是下面的X1,X2,X3是初始化了X1,X2,X3的输入数据,后面*的数说明里面初始化的数
# 在python中打印出来表达的意思是14列的矩阵中每个元素为43列
# 在图像上理解为数量为1个的4x4x3的图像
X1 = np.ones((1, 4, 4, 3)) * -1     
X2 = np.ones((1, 4, 4, 3)) * 1
X3 = np.ones((1, 4, 4, 3)) * 3

# concatenate()函数是连接函数,将X1,X2,X3连接在一起组成新的矩阵,axis为默认0
# astype()函数是将前面连接起来的矩阵中的数都转换为float
X = np.concatenate((X1, X2, X3), axis = 0).astype(np.float32)


A3 = identity_block(X, f=2, filters=[4, 4, 3],
                   initializer=lambda seed=0:constant(value=1),
                   training=False)
                   
print('\033[1mWith training=False\033[0m\n')
A3np = A3.numpy()
# 将第3维平均一下,使用around第一个参数是数组,后面5的意思是小数点后保留5print(np.around(A3.numpy()[:,(0,-1),:,:].mean(axis = 3), 5))
resume = A3np[:,(0,-1),:,:].mean(axis = 3)
print(resume[1, 1, 0])

print('\n\033[1mWith training=True\033[0m\n')
np.random.seed(1)
A4 = identity_block(X, f=2, filters=[3, 3, 3],
                   initializer=lambda seed=0:constant(value=1),
                   training=True)
print(np.around(A4.numpy()[:,(0,-1),:,:].mean(axis = 3), 5))

public_tests.identity_block_test(identity_block)

代码段2:

def identity_block(X, f, filters, training=True, initializer=random_uniform):
    
    # F1,F2,F3分别是4,4,3
    F1, F2, F3 = filters
    
    # X_shortcut用来保存输入的图像X
    X_shortcut = X
    
    # First component of main path
    # 第一步 先卷积后归一化再ReLU函数非线性化
    X = Conv2D(filters = F1, kernel_size = 1, strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X, training = training) # Default axis
    X = Activation('relu')(X)
    
    ### START CODE HERE
    ## Second component of main path (3 lines)
    X = Conv2D(filters = F2,kernel_size = f, strides=(1,1),padding = 'same', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X,training = training)
    X = Activation('relu')(X)

    ## Third component of main path (2 lines)
    X = Conv2D(filters = F3, kernel_size = 1, strides = (1,1), padding = 'valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X,training = training)
    
    ## Final step: Add shortcut value to main path, and pass it through a RELU activation (2 lines)
    # 最后通过Add()函数将处理后的结果与输入的结果相加并非线性化
    X = Add()([X,X_shortcut])
    X = Activation('relu')(X)
    ### END CODE HERE

    return X

分析:

  1. 首先是3个[1,4,4,3]的数组,里面的数字都不一样,利用concatenate()函数将三个数组合并在一起[3,4,4,3],进入identity_block()代码函数中,下面是对concatenate()函数进行验证

在这里插入图片描述

输出

在这里插入图片描述

  1. 在我们要实现的函数中根据作者的提示完成下面的函数,这里使用了Add()函数要严格按照给出的方式进行填写。
  2. 后面在该函数输出的结果是tf的张量,因此使用了**numpy()**将张量变为数组在这里插入图片描述
The Convolutional Block

在这里插入图片描述

在捷径中有一个CONV2D层

步骤中在最后一步前面加了一个捷径在这里插入图片描述

Exercise 2 - convolutional_block
from outputs import convolutional_block_output1, convolutional_block_output2
np.random.seed(1)
#X = np.random.randn(3, 4, 4, 6).astype(np.float32) 已经给出提示为float[3,4,4,6]
X1 = np.ones((1, 4, 4, 3)) * -1
X2 = np.ones((1, 4, 4, 3)) * 1
X3 = np.ones((1, 4, 4, 3)) * 3
# 连接
X = np.concatenate((X1, X2, X3), axis = 0).astype(np.float32)
#进入编写函数
A = convolutional_block(X, f = 2, filters = [2, 4, 6], training=False)

# 进行判断输出的类型是否为EagerTensor
assert type(A) == EagerTensor, "Use only tenso撒rflow and keras functions"
assert tuple(tf.shape(A).numpy()) == (3, 2, 2, 6), "Wrong shape."
assert np.allclose(A.numpy(), convolutional_block_output1), "Wrong values when training=False."
print(A[0])

# 更改参数training
B = convolutional_block(X, f = 2, filters = [2, 4, 6], training=True)
assert np.allclose(B.numpy(), convolutional_block_output2), "Wrong values when training=True."

print('\033[92mAll tests passed!')

代码段2:

def convolutional_block(X, f, filters, s = 2, training=True, initializer=glorot_uniform):
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value
    X_shortcut = X


    ##### MAIN PATH #####
    
    # First component of main path glorot_uniform(seed=0)
    X = Conv2D(filters = F1, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X, training=training)
    X = Activation('relu')(X)

    ### START CODE HERE
    
    ## Second component of main path (3 lines)
    X = Conv2D(filters = F2, kernel_size = f, strides = (1, 1), padding='same', kernel_initializer = initializer(seed=0))(X) 
    X = BatchNormalization(axis = 3)(X, training=training)
    X = Activation('relu')(X)

    ## Third component of main path (2 lines)
    X = Conv2D(filters = F3, kernel_size = 1, strides = (1, 1), padding='valid', kernel_initializer = initializer(seed=0))(X)
    X = BatchNormalization(axis = 3)(X, training=training)
    
    ##### SHORTCUT PATH ##### (2 lines)
    X_shortcut = Conv2D(filters = F3, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X_shortcut)
    X_shortcut = BatchNormalization(axis = 3)(X_shortcut, training=training)
    
    ### END CODE HERE

    # Final step: Add shortcut value to main path (Use this order [X, X_shortcut]), and pass it through a RELU activation
    X = Add()([X, X_shortcut])
    X = Activation('relu')(X)
    
    return X

分析:
和之前的上一个作业一样只是在

X_shortcut = Conv2D(filters = F3, kernel_size = 1, strides = (s, s), padding='valid', kernel_initializer = initializer(seed=0))(X_shortcut)
    X_shortcut = BatchNormalization(axis = 3)(X_shortcut, training=training)

要注意里面的参数是X_shortcut也就是输入的图像

Building Your First ResNet Model (50 layers)

The details of this ResNet-50 model are:
对输入数据进行0填充,pad=(3,3)
Stage 1:

  • 64个过滤器shape为(7,7),步数stridestride为2
  • 规范层(BatchNorm)对输入数据进行通道轴归一化。
  • 最大值池化层使用一个(3,3)的窗口和(2,2)的步伐。

Stage 2:

  • 卷积块使用f=3大小为[64,64,256],步数为1
  • 2个恒等块使用三个大小为[64,64,256]的过滤器,f=3

Stage 3:

  • 卷积块使用f=3大小为[128,128,512],步数为2
  • 3个恒等块使用三个大小为[128,128,512]的过滤器,f=3

Stage 4:

  • 卷积块使用f=3大小为[256, 256, 1024],步数为2
  • 5个恒等块使用三个大小为[256, 256, 1024]的过滤器,f=3

Stage 5:

  • 卷积块使用f=3大小为[512, 512, 2048],步数为2
  • 2个恒等块使用三个大小为[512, 512, 2048]的过滤器,f=3

均值池化层使用维度为(2,2)的窗口
展开操作没有任何超参数以及命名
全连接层(密集连接)使用softmax激活函数

Exercise 3 - ResNet50
def ResNet50(input_shape = (64, 64, 3), classes = 6):
    
    
    # Define the input as a tensor with shape input_shape
    X_input = Input(input_shape)

    
    # Zero-Padding pad=(3,3)0填充
    X = ZeroPadding2D((3, 3))(X_input)
    
    # Stage 1
    X = Conv2D(64, (7, 7), strides = (2, 2), kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3)(X)
    X = Activation('relu')(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)

    # Stage 2
    X = convolutional_block(X, f = 3, filters = [64, 64, 256], s = 1)
    X = identity_block(X, 3, [64, 64, 256])
    X = identity_block(X, 3, [64, 64, 256])

    ### START CODE HERE
    
    ## Stage 3 (4 lines)
    X = convolutional_block(X, f = 3, filters = [128,128,512], s = 2)
    X = identity_block(X, 3, [128,128,512])
    X = identity_block(X, 3, [128,128,512])
    X = identity_block(X, 3, [128,128,512])
    
    ## Stage 4 (6 lines)
    X = convolutional_block(X, f = 3, filters = [256, 256, 1024], s = 2)
    X = identity_block(X, 3, [256, 256, 1024])
    X = identity_block(X, 3, [256, 256, 1024])
    X = identity_block(X, 3, [256, 256, 1024])
    X = identity_block(X, 3, [256, 256, 1024])
    X = identity_block(X, 3, [256, 256, 1024])

    ## Stage 5 (3 lines)
    X = convolutional_block(X, f = 3, filters = [512, 512, 2048], s = 2) 
    X = identity_block(X, 3, [512, 512, 2048])
    X = identity_block(X, 3, [512, 512, 2048])

    ## AVGPOOL (1 line). Use "X = AveragePooling2D(...)(X)"
    X = AveragePooling2D(pool_size=(2,2))(X) 
    
    ### END CODE HERE

    # output layer
    X = Flatten()(X)
    X = Dense(classes, activation='softmax', kernel_initializer = glorot_uniform(seed=0))(X)
    
    
    # Create model
    model = Model(inputs = X_input, outputs = X)

    return model

分析:
可以按照上面说的步骤直接写,修改参数就行了

model = ResNet50(input_shape = (64, 64, 3), classes = 6)
print(model.summary())

显示输出模型各层的参数状况

from outputs import ResNet50_summary

model = ResNet50(input_shape = (64, 64, 3), classes = 6)

comparator(summary(model), ResNet50_summary)

对模型做实体化和编译工作
模型已经准备好了,接下来就是加载训练集进行训练。

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs = 10, batch_size = 32)

对结果进行估测并打印出来

preds = model.evaluate(X_test, Y_test)
print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值