Step by Step
卷积
-
补全zero_pad
X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0) #第二个参数是对哪些维度的两边加怎么样的padding
-
补全conv_single_step
### START CODE HERE ### (≈ 2 lines of code) # Element-wise product between a_slice and W. Add bias. s = np.multiply(a_slice_prev , W) # Sum over all entries of the volume s Z = np.sum(s) + float(b) ### END CODE HERE ###
- 将卷积核与要卷积的部分,对应位置相乘之后求和,再加上一个bias
-
补全conv_forward
- 按卷积核提取出要卷积的数据
- 卷积并赋值
### START CODE HERE ### # Retrieve dimensions from A_prev's shape (≈1 line) (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape # Retrieve dimensions from W's shape (≈1 line) (f, f, n_C_prev, n_C) = W.shape # Retrieve information from "hparameters" (≈2 lines) stride = hparameters['stride'] pad = hparameters['pad'] # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines) #计算卷积之后的结果矩阵的大小 n_H = int((n_H_prev - f + 2 * pad) / stride + 1) n_W = int((n_W_prev - f + 2 * pad) / stride + 1) # Initialize the output volume Z with zeros. (≈1 line) Z = np.zeros((m , n_H , n_W , n_C)) # Create A_prev_pad by padding A_prev A_prev_pad = zero_pad(A_prev , pad) for i in range(m):# 遍历所有训练数据 a_prev_pad = A_prev_pad[i]#选出第 i 个训练数据 , 大小为 (n_H, n_W, n_C) for h in range(n_H): # loop over vertical axis of the output volume for w in range(n_W): # loop over horizontal axis of the output volume for c in range(n_C): # loop over channels (= #filters) of the output volume # Find the corners of the current "slice" (≈4 lines) vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # 选取要进行卷积的片段 a_slice_prev = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :] # 进行单步卷积 Z[i, h, w, c] = conv_single_step(a_slice_prev , W[:,:,:,c] , b[:,:,:,c]) ### END CODE HERE ###
池化
-
补全pool_forward
- 提取数据的方式和卷积相同,先提取要池化的数据
- 然后进行池化,并赋值
### START CODE HERE ### for i in range(m): # loop over the training examples for h in range(n_H): # loop on the vertical axis of the output volume for w in range(n_W): # loop on the horizontal axis of the output volume for c in range (n_C): # loop over the channels of the output volume # Find the corners of the current "slice" (≈4 lines) vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line) a_prev_slice = A_prev[i , vert_start:vert_end , horiz_start:horiz_end , c] # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean. if mode == "max": A[i, h, w, c] = np.max(a_prev_slice) elif mode == "average": A[i, h, w, c] = np.average(a_prev_slice) ### END CODE HERE ###
卷积BP
BP推导
-
首先求损失函数对卷积层输入的梯度,令卷积层输入为 A A A , 卷积层输出为 Z Z Z
-
d A + = ∑ h = 0 n H ∑ w = 0 n W W c × d Z h w dA += \sum _{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw} dA+=∑h=0nH∑w=0nWWc×dZhw
- 这里的 W c W_c Wc 就是一个卷积,也就是一个与原数据进行点乘的矩阵
-
d
Z
h
w
dZ_{hw}
dZhw 指的是损失函数对卷积结果矩阵中元素 (h , w) 的梯度。
- 容易看出的是,损失是个标量 , Z h w Z_{hw} Zhw 也是标量,因此标量对标量求导 d Z h w dZ_{hw} dZhw 也是标量
- 对黄色区域数据来说, d A y e l l o w = W c × d Z 00 dA_{yellow} = W_c \times dZ_{00} dAyellow=Wc×dZ00
- 我们遍历右边的卷积结果矩阵,每个位置都会产生 d A dA dA ,总的结果是将所有位置产生的梯度相加
-
-
然后求 d W c dW_c dWc
-
d
W
c
+
=
∑
h
=
0
n
H
∑
w
=
0
n
W
a
s
l
i
c
e
×
d
Z
h
w
dW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw}
dWc+=∑h=0nH∑w=0nWaslice×dZhw
- 容易看出,对 W W W 的梯度只是将对 A A A 的梯度中的 W c W_c Wc 换成了 α s l i c e \alpha_{slice} αslice
- 这很容易理解,对上图黄色区域来说 Z h w = W c × α s l i c e Z_{hw} = W_c \times \alpha_{slice} Zhw=Wc×αslice ,因为是点乘,因此 Z h w Z_{hw} Zhw 对 W c W_c Wc 的导数是 α s l i c e \alpha_{slice} αslice , 对 α s l i c e \alpha_{slice} αslice 的导数是 W c W_c Wc
-
d
W
c
+
=
∑
h
=
0
n
H
∑
w
=
0
n
W
a
s
l
i
c
e
×
d
Z
h
w
dW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw}
dWc+=∑h=0nH∑w=0nWaslice×dZhw
-
最后求 d b db db
- d b = ∑ h ∑ w d Z h w db = \sum_h \sum_w dZ_{hw} db=∑h∑wdZhw
- 与上面不同的是, Z h w Z_{hw} Zhw 对 b b b 的导数是 1
-
补全conv_backward
- 根据上面的推导,该函数很容易补全
### START CODE HERE ### # Retrieve information from "cache" (A_prev, W, b, hparameters) = cache # Retrieve dimensions from A_prev's shape (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape # Retrieve dimensions from W's shape (f, f, n_C_prev, n_C) = W.shape # Retrieve information from "hparameters" stride = hparameters['stride'] pad = hparameters['pad'] # Retrieve dimensions from dZ's shape (m, n_H, n_W, n_C) = dZ.shape # Initialize dA_prev, dW, db with the correct shapes dA_prev = np.zeros(A_prev.shape) dW = np.zeros(W.shape) db = np.zeros(b.shape) # Pad A_prev and dA_prev A_prev_pad = zero_pad(A_prev , pad) dA_prev_pad = zero_pad(dA_prev , pad) for i in range(m): # loop over the training examples # select ith training example from A_prev_pad and dA_prev_pad a_prev_pad = A_prev_pad[i] da_prev_pad = dA_prev_pad[i] for h in range(n_H): # loop over vertical axis of the output volume for w in range(n_W): # loop over horizontal axis of the output volume for c in range(n_C): # loop over the channels of the output volume # Find the corners of the current "slice" vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # Use the corners to define the slice from a_prev_pad a_slice = a_prev_pad[vert_start:vert_end , horiz_start:horiz_end , :] # Update gradients for the window and the filter's parameters using the code formulas given above da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i , h , w , c] dW[:,:,:,c] += a_slice * dZ[i , h , w , c] db[:,:,:,c] += dZ[i , h , w, c] # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :]) dA_prev[i, :, :, :] = da_prev_pad[pad:-pad , pad:-pad] ### END CODE HERE ###
池化BP
最大值池化
可见,对上图来说,最大值池化只有最大值起作用,因此池化结果的梯度由池化输入的最大值造成。这时,我们可以在池化之前的梯度上加上一个mask,用于只让最大值对梯度产生影响。
对于一次池化来说池化结果 Z Z Z 与池化的输入 A A A 之间的关系为 Z = m a x ( A ) Z = max(A) Z=max(A)
def create_mask_from_window(x):
"""
Creates a mask from an input matrix x, to identify the max entry of x.
Arguments:
x -- Array of shape (f, f)
Returns:
mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
"""
### START CODE HERE ### (≈1 line)
mask = (x == np.max(x))
### END CODE HERE ###
return mask
我们将mask添加到梯度的计算上
if mode == "max":
a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]#看作上图左上角蓝色区域
mask = create_mask_from_window(a_prev_slice)#计算该区域对应的mask,即除了 7 的位置为 1 外,其余位置都是 0
#池化结果的梯度由 7 造成,因此我们将原梯度点乘 mask,只让 7 发挥作用
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i,h,w,c]
平均值池化
对上面左上角区域来说,令池化的输出为 Z Z Z , 输入为 A A A , 那么 Z = 1 4 s u m ( A ) = 1 4 ( a 1 + a 2 + a 3 + a 4 ) Z = \frac{1}{4}sum(A) = \frac{1}{4}(a_1+a_2+a_3+a_4) Z=41sum(A)=41(a1+a2+a3+a4)。也就是说 A A A 中的四个位置的变化对 Z Z Z 的影响程度是相同的。比如左上角的 2 变成 6 , 那么输出结果就是 5。如果不是 2 变而是 3 变成 7 , 输出结果同样是 5。因此 Z Z Z 的变化是由 A A A 中元素共同影响的,且影响相同。同样,由公式,可得 d Z = 1 4 ( d a 1 + d a 2 + d a 3 + d a 4 ) dZ = \frac{1}{4}(da_1+da_2+da_3+da_4) dZ=41(da1+da2+da3+da4)
也就是说,池化结果的梯度(也就是变化程度)由池化的输入元素均摊
这里我们写一个影响均摊函数,返回一个 mask
def distribute_value(dz, shape):
"""
Distributes the input value in the matrix of dimension shape
Arguments:
dz -- input scalar
shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
Returns:
a -- Array of size (n_H, n_W) for which we distributed the value of dz
"""
### START CODE HERE ###
# Retrieve dimensions from shape (≈1 line)
(n_H, n_W) = shape
# Compute the value to distribute on the matrix (≈1 line)
average = dz / (n_H * n_W)
# Create a matrix where every entry is the "average" value (≈1 line)
a = np.zeros(shape) + average
### END CODE HERE ###
return a
我们将 mask 添加到计算上
elif mode == "average": da = dA[i,h,w,c] #可以看做上图右边的4的梯度 shape = [f,f] #均摊框的大小 #输出 4 的变化由左上角四个元素均摊 dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da,shape)
Application
TensorFlow要求您为运行会话时将输入到模型中的输入数据创建占位符。现在我们要实现创建占位符的函数,因为我们使用的是小批量数据块,输入的样本数量可能不固定,所以我们在数量那里我们要使用None作为可变数量。输入X的维度为[None,n_H0,n_W0,n_C0],对应的Y是[None,n_y]。
def create_placeholders(n_H0, n_W0, n_C0, n_y): """ Creates the placeholders for the tensorflow session. Arguments: n_H0 -- scalar, height of an input image n_W0 -- scalar, width of an input image n_C0 -- scalar, number of channels of the input n_y -- scalar, number of classes Returns: X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float" Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float" """ ### START CODE HERE ### (≈2 lines) X = tf.placeholder(tf.float32 , [None , n_H0 , n_W0 , n_C0]) Y = tf.placeholder(tf.float32 , [None , n_y]) ### END CODE HERE ### return X, Y
补全initialize_parameters
# GRADED FUNCTION: initialize_parametersdef initialize_parameters(): """ Initializes weight parameters to build a neural network with tensorflow. The shapes are: W1 : [4, 4, 3, 8] W2 : [2, 2, 8, 16] Returns: parameters -- a dictionary of tensors containing W1, W2 """ tf.set_random_seed(1) # so that your "random" numbers match ours ### START CODE HERE ### (approx. 2 lines of code) W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0)) W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0)) ### END CODE HERE ### parameters = {"W1": W1, "W2": W2} return parameters
补全 forward_propagation
-
tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = ‘SAME’)#给定输入X和一组过滤器W1,这个函数将会自动使用W1来对X进行卷积,第三个输入参数是[1,s,s,1]是指对于输入 (m, n_H_prev, n_W_prev, n_C_prev)而言,每次滑动的步伐
-
tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = ‘SAME’)#给定输入X,该函数将会使用大小为(f,f)以及步伐为(s,s)的窗口对其进行滑动取最大值
-
tf.nn.relu(Z1)#计算Z1的ReLU激活
-
tf.contrib.layers.flatten(P)#给定一个输入P,此函数将会把每个样本转化成一维的向量,然后返回一个tensor变量,其维度为(batch_size,k)
-
tf.contrib.layers.fully_connected(F, num_outputs)#给定一个已经一维化了的输入F,此函数将会返回一个由全连接层计算过后的输出#全连接层会自动初始化权值且在你训练模型的时候它也会一直参与,所以当我们初始化参数的时候我们不需要专门去初始化它的权值
# GRADED FUNCTION: forward_propagationdef forward_propagation(X, parameters): """ Implements the forward propagation for the model: CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED Arguments: X -- input dataset placeholder, of shape (input size, number of examples) parameters -- python dictionary containing your parameters "W1", "W2" the shapes are given in initialize_parameters Returns: Z3 -- the output of the last LINEAR unit """ # Retrieve the parameters from the dictionary "parameters" W1 = parameters['W1'] W2 = parameters['W2'] ### START CODE HERE ### # CONV2D: stride of 1, padding 'SAME' Z1 = tf.nn.conv2d(X , W1 , strides=[1,1,1,1] , padding='SAME') # RELU A1 = tf.nn.relu(Z1) # MAXPOOL: window 8x8, sride 8, padding 'SAME' P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME') # CONV2D: filters W2, stride 1, padding 'SAME' Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME') # RELU A2 = tf.nn.relu(Z2) # MAXPOOL: window 4x4, stride 4, padding 'SAME' P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1],strides = [1,4,4,1],padding = "SAME") # FLATTEN P2 = tf.contrib.layers.flatten(P2) # FULLY-CONNECTED without non-linear activation function (not not call softmax). # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None" Z3 = tf.contrib.layers.fully_connected(P2, 6,activation_fn = None) ### END CODE HERE ### return Z3
补全compute_cost
tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)#计算softmax的损失函数。#这个函数既计算softmax的激活,也计算其损失
tf.reduce_mean()#计算的是所有样本损失的平均值
# GRADED FUNCTION: compute_cost def compute_cost(Z3, Y): """ Computes the cost Arguments: Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples) Y -- "true" labels vector placeholder, same shape as Z3 Returns: cost - Tensor of the cost function """ ### START CODE HERE ### (1 line of code) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y)) ### END CODE HERE ### return cost
搭建并训练整个模型
- 为 X , Y 创建占位符
- 初始化参数
- 前向传播
- 计算cost
- 创建一个优化器
- minibatch训练
# GRADED FUNCTION: modeldef model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009, num_epochs = 100, minibatch_size = 64, print_cost = True): """ Implements a three-layer ConvNet in Tensorflow: CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED Arguments: X_train -- training set, of shape (None, 64, 64, 3) Y_train -- test set, of shape (None, n_y = 6) X_test -- training set, of shape (None, 64, 64, 3) Y_test -- test set, of shape (None, n_y = 6) learning_rate -- learning rate of the optimization num_epochs -- number of epochs of the optimization loop minibatch_size -- size of a minibatch print_cost -- True to print the cost every 100 epochs Returns: train_accuracy -- real number, accuracy on the train set (X_train) test_accuracy -- real number, testing accuracy on the test set (X_test) parameters -- parameters learnt by the model. They can then be used to predict. """ ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables tf.set_random_seed(1) # to keep results consistent (tensorflow seed) seed = 3 # to keep results consistent (numpy seed) (m, n_H0, n_W0, n_C0) = X_train.shape n_y = Y_train.shape[1] costs = [] # To keep track of the cost # Create Placeholders of the correct shape ### START CODE HERE ### (1 line) X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y) ### END CODE HERE ### # Initialize parameters ### START CODE HERE ### (1 line) parameters = initialize_parameters() ### END CODE HERE ### # Forward propagation: Build the forward propagation in the tensorflow graph ### START CODE HERE ### (1 line) Z3 = forward_propagation(X, parameters) ### END CODE HERE ### # Cost function: Add cost function to tensorflow graph ### START CODE HERE ### (1 line) cost = compute_cost(Z3, Y) ### END CODE HERE ### # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost. ### START CODE HERE ### (1 line) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) ### END CODE HERE ### # Initialize all the variables globally init = tf.global_variables_initializer() # Start the session to compute the tensorflow graph with tf.Session() as sess: # Run the initialization sess.run(init) # Do the training loop for epoch in range(num_epochs): minibatch_cost = 0. num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set seed = seed + 1 minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed) for minibatch in minibatches: # Select a minibatch (minibatch_X, minibatch_Y) = minibatch # IMPORTANT: The line that runs the graph on a minibatch. # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y). ### START CODE HERE ### (1 line) _ , temp_cost = sess.run([optimizer,cost],feed_dict={X:minibatch_X, Y:minibatch_Y}) ### END CODE HERE ### minibatch_cost += temp_cost / num_minibatches # Print the cost every epoch if print_cost == True and epoch % 5 == 0: print ("Cost after epoch %i: %f" % (epoch, minibatch_cost)) if print_cost == True and epoch % 1 == 0: costs.append(minibatch_cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() # Calculate the correct predictions predict_op = tf.argmax(Z3, 1) correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1)) # Calculate accuracy on the test set accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print(accuracy) train_accuracy = accuracy.eval({X: X_train, Y: Y_train}) test_accuracy = accuracy.eval({X: X_test, Y: Y_test}) print("Train Accuracy:", train_accuracy) print("Test Accuracy:", test_accuracy) return train_accuracy, test_accuracy, parameters
Tensor(“Mean_1:0”, shape=(), dtype=float32)
Train Accuracy: 0.86851853
Test Accuracy: 0.73333335
最后,点个赞
Residual Networks
深层网络可以拟合特别复杂的函数,但是容易造成梯度消失问题。
Residual Networks加了 ‘shortcut’ , 以便于梯度能够直接传播到前面的层
本次实验的Residual Networks主要由两种block构成
-
identity block
- 是ResNets中的一个标准block,使用在输入与输出的维度相同的时候
- 在 ‘shortcut’ 路径上没有进行卷积
-
这里我们根据实验中的设计,补全identity_block函数
# GRADED FUNCTION: identity_blockdef identity_block(X, f, filters, stage, block): """ Implementation of the identity block as defined in Figure 3 Arguments: X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev) f -- integer, specifying the shape of the middle CONV's window for the main path filters -- python list of integers, defining the number of filters in the CONV layers of the main path stage -- integer, used to name the layers, depending on their position in the network block -- string/character, used to name the layers, depending on their position in the network Returns: X -- output of the identity block, tensor of shape (n_H, n_W, n_C) """ # defining name basis conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' # Retrieve Filters F1, F2, F3 = filters # Save the input value. You'll need this later to add back to the main path. X_shortcut = X # First component of main path X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) X = Activation('relu')(X) ### START CODE HERE ### # Second component of main path (≈3 lines) X = Conv2D(filters=F2,kernel_size =(f,f),strides=(1,1),padding = 'SAME',name = conv_name_base +'2b',kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis=3 , name = bn_name_base + '2b')(X) X = Activation('relu')(X) # Third component of main path (≈2 lines) X = Conv2D(filters=F3,kernel_size =(1,1),strides=(1,1),padding = 'VALID',name = conv_name_base +'2c',kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis=3 , name = bn_name_base + '2c')(X) # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines) X = Add()([X,X_shortcut]) X = Activation('relu')(X) ### END CODE HERE ### return X
-
convolutional block
-
使用在输入与输出的维度不同时
-
与上面identity block的唯一区别就是在‘shortcut’路径上也进行了卷积
-
-
这里我们根据实验中的设计,补全convolutional_block函数
# GRADED FUNCTION: convolutional_blockdef convolutional_block(X, f, filters, stage, block, s = 2): """ Implementation of the convolutional block as defined in Figure 4 Arguments: X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev) f -- integer, specifying the shape of the middle CONV's window for the main path filters -- python list of integers, defining the number of filters in the CONV layers of the main path stage -- integer, used to name the layers, depending on their position in the network block -- string/character, used to name the layers, depending on their position in the network s -- Integer, specifying the stride to be used Returns: X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C) """ # defining name basis conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' # Retrieve Filters F1, F2, F3 = filters # Save the input value X_shortcut = X ##### MAIN PATH ##### # First component of main path X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) X = Activation('relu')(X) ### START CODE HERE ### # Second component of main path (≈3 lines) X = Conv2D(F2,(f,f),strides=(1,1),padding='SAME',name = conv_name_base+'2b',kernel_initializer=glorot_uniform(seed=0))(X) X = BatchNormalization(axis=3,name = bn_name_base+'2b')(X) X = Activation('relu')(X) # Third component of main path (≈2 lines) X = Conv2D(F3,(1,1),strides=(1,1),padding='VALID',name=conv_name_base+'2c',kernel_initializer=glorot_uniform(seed=0))(X) X = BatchNormalization(axis=3,name=bn_name_base+'2c')(X) ##### SHORTCUT PATH #### (≈2 lines) X_shortcut = Conv2D(F3,(1,1),strides=(s,s),padding='VALID',name=conv_name_base+'1',kernel_initializer = glorot_uniform(seed=0))(X_shortcut) X_shortcut = BatchNormalization(axis=3,name = bn_name_base+'1')(X_shortcut) # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines) X = Add()([X,X_shortcut]) X = Activation('relu')(X) ### END CODE HERE ### return X
-
最后我们根据上面构造的两个 ResNets 基本块,进行整个网络的搭建与训练
-
补全ResNet50函数
# GRADED FUNCTION: ResNet50def ResNet50(input_shape = (64, 64, 3), classes = 6): """ Implementation of the popular ResNet50 the following architecture: CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3 -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER Arguments: input_shape -- shape of the images of the dataset classes -- integer, number of classes Returns: model -- a Model() instance in Keras """ # Define the input as a tensor with shape input_shape X_input = Input(input_shape) # Zero-Padding X = ZeroPadding2D((3, 3))(X_input) # Stage 1 X = Conv2D(64, (7, 7), strides = (2, 2), name = 'conv1', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = 'bn_conv1')(X) X = Activation('relu')(X) X = MaxPooling2D((3, 3), strides=(2, 2))(X) # Stage 2 X = convolutional_block(X, f = 3, filters = [64, 64, 256], stage = 2, block='a', s = 1) X = identity_block(X, 3, [64, 64, 256], stage=2, block='b') X = identity_block(X, 3, [64, 64, 256], stage=2, block='c') ### START CODE HERE ### # Stage 3 (≈4 lines) X = convolutional_block(X,f=3,filters=[128,128,512],stage=3,block='a',s=2) X = identity_block(X,3,[128,128,512],stage=3,block='b') X = identity_block(X,3,[128,128,512],stage=3,block='c') X = identity_block(X,3,[128,128,512],stage=3,block='d') # Stage 4 (≈6 lines) X = convolutional_block(X,f=3,filters=[256,256,1024],stage=4,block='a',s=2) X = identity_block(X,3,[256,256,1024],stage=4,block='b') X = identity_block(X,3,[256,256,1024],stage=4,block='c') X = identity_block(X,3,[256,256,1024],stage=4,block='d') X = identity_block(X,3,[256,256,1024],stage=4,block='e') X = identity_block(X,3,[256,256,1024],stage=4,block='f') # Stage 5 (≈3 lines) X = convolutional_block(X,f=3,filters=[512,512,2048],stage=5,block='a',s=2) X = identity_block(X,3,[512,512,2048],stage=5,block='b') X = identity_block(X,3,[512,512,2048],stage=5,block='c') # AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)" X = AveragePooling2D((2,2),name='avg_pool')(X) ### END CODE HERE ### # output layer X = Flatten()(X) X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer = glorot_uniform(seed=0))(X) # Create model model = Model(inputs = X_input, outputs = X, name='ResNet50') return model
-
使用 SIGNS 数据集训练并得到最终结果
- 120/120 [==============================] - 3s 24ms/step
- Loss = 0.5301783005396525
- Test Accuracy = 0.8666666626930237
-