【吴恩达ML课程-Course2-Week3】超参数调试、Batch正则化和程序框架-编程作业

最新推荐文章于 2021-02-03 13:44:31 发布

GZH_1992

最新推荐文章于 2021-02-03 13:44:31 发布

阅读量253

点赞数

本文链接：https://blog.csdn.net/GZH_1992/article/details/104685775

版权

我是刚刚自学这门课程，发现网上几乎找不到这周课程的编程作业，所以自己写完后就发了上来，也是第一次发布。同样作为一个入门者，希望能有点贡献。这次的编程作业有点麻烦，是因为Andrew提供的jupyter notebook里的代码是基于Tensorflow1.x，而我最近装好的Tensorflow是2.1的版本，两个版本发生了一些显著的更改。所以如果安装了Tensorflow2的版本，在作业中提供的框架中完成这次作业的话，需要做一些修改。下边是作业。

import numpy as np
import h5py
import matplotlib.pyplot as plt
import tensorflow.compat.v1 as tf
from tensorflow.python.framework import ops
import tf_utils
import time
from tf_utils import load_dataset,random_mini_batches
%matplotlib inline
np.random.seed(1)

这里做了一个改动，用tensorflow.compat.v1的原因是想兼容作业中的tensorflow1.x语法，这样在安装了tensorflow2.x的情况下，也可以使得大部分1.x版本的代码顺利运行。

tf.disable_eager_execution() # 禁用eager模式
# the loss variable will be initialized and ready to be computed
y_hat = tf.constant(36, name='y_hat') # Define y_hat constant. Set to 36.
y = tf.constant(39, name='y') # Define y. Set to 39
loss = tf.Variable((y - y_hat)**2, name='loss') # Create a variable for the loss
init = tf.global_variables_initializer() # When init is run later (session.run(init)),
with tf.Session() as session: # Create a session and print the output
session.run(init) # Initializes the variables
print(session.run(loss))

因为Tensorflow2.x是默认eager模式的，与1.x的静态图模式是冲突的，所谓为了使得作业中大部分的代码能供运行，需要将2.x“退化”回1.x，所以这里禁用eager模式，那么Tensorflow就会以静态图模式运行。

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a,b)
tf.print(c)

输出结果：

<tf.Operation 'PrintV2' type=PrintV2>

静态图模式下，不创建session并且run的话，是不会执行运算的。

sess = tf.Session()
print(sess.run(c))

输出结果：

# Change the value of x in the feed_dict
x = tf.placeholder(tf.int64, name = 'x')
print(sess.run(2 * x, feed_dict = {x: 3}))
sess.close()

输出结果

中间有些内容较为简单，我就不放了。

计算cross entropy cost $J$ ,他的定义如下： $\frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log \sigma(z^{[2](i)}) + (1-y^{(i)})\log (1-\sigma(z^{[2](i)})\large )\small\tag{1}$

def cost(logits, labels):
"""
Computes the cost using the sigmoid cross entropy
Arguments:
logits -- vector containing z, output of the last linear unit (before the final sigmoid activation)
labels -- vector of labels y (1 or 0)
Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels"
in the TensorFlow documentation. So logits will feed into z, and labels into y.
Returns:
cost -- runs the session of the cost (formula (2))
"""
### START CODE HERE ###
# Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines)
z = tf.placeholder(tf.float32, name = "z")
y = tf.placeholder(tf.float32, name = "y")
# Use the loss function (approx. 1 line)
cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y)
# Create a session (approx. 1 line). See method 1 above.
sess = tf.Session()
# Run the session (approx. 1 line).
cost = sess.run(cost, feed_dict = {z: logits, y: labels} )
# Close the session (approx. 1 line). See method 1 above.
sess.close()
### END CODE HERE ###
return cos

测试：

logits = sigmoid(np.array([0.2,0.4,0.7,0.9]))
cost = cost(logits, np.array([0,0,1,1]))
print ("cost = " + str(cost))

输出结果：

cost = [1.0053872 1.0366409 0.4138543 0.39956614]

定义one_hot_matrix函数：

def one_hot_matrix(labels, C):
"""
Creates a matrix where the i-th row corresponds to the ith class number and the jth column
corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
will be 1.
Arguments:
labels -- vector containing the labels
C -- number of classes, the depth of the one hot dimension
Returns:
one_hot -- one hot matrix
"""
### START CODE HERE ###
# Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
C = tf.constant(C,name = "C")
# Use tf.one_hot, be careful with the axis (approx. 1 line)
one_hot_matrix = tf.one_hot(labels, C, axis = 0)
# Create the session (approx. 1 line)
sess = tf.Session()
# Run the session (approx. 1 line)
one_hot = sess.run(one_hot_matrix)
# Close the session (approx. 1 line). See method 1 above.
sess.close()
### END CODE HERE ###
return one_hot

测试：

labels = np.array([[1,2,3,0,2,1]])
one_hot = one_hot_matrix(np.squeeze(labels), C = 4)
print ("one_hot = " + str(one_hot))
print(labels.shape,one_hot.shape)

one_hot = [[0. 0. 0. 1. 0. 0.]
[1. 0. 0. 0. 0. 1.]
[0. 1. 0. 0. 1. 0.]
[0. 0. 1. 0. 0. 0.]]
(1, 6) (4, 6)

Flatten 数据：

# Flatten the training and test images
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
# Normalize image vectors
X_train = X_train_flatten/255.
X_test = X_test_flatten/255.
# Convert training and test labels to one hot matrices
Y_train = one_hot_matrix(np.squeeze(Y_train_orig), 6)
Y_test = one_hot_matrix(np.squeeze(Y_test_orig), 6)
# Y_train = Y_train.reshape(Y_train.shape[0], Y_train.shape[2]) #自己新加的
# Y_test = Y_test.reshape(Y_test.shape[0], Y_test.shape[2])#自己新加的
# np.random.seed(0)
# permutation = list(np.random.permutation(1080))
# shuffled_X_train = X_train[:, permutation]
# shuffled_Y_train = Y_train[:, permutation]
print ("number of training examples = " + str(X_train.shape[1]))
print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape)）

这里对原文提供的代码的一个地方做了一个修改，就是one_hot_matrix(np.squeeze(Y_test_orig), 6)，需要用squeeze对label数据进行一下降维，要不然候面的程序会出错。
输出结果：

number of training examples = 1080
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

如果不进行上边的修改，Y_train shape会是（6，1，1080），Y_test shape是 (6, 1，120)。
后边的内容都比较简单，跟着公式补充完代码块即可，由于太占地方，我就不放了。我们直接上model的定义：

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
num_epochs = 1600, minibatch_size = 32, print_cost = True):
"""
Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX.
Arguments:
X_train -- training set, of shape (input size = 12288, number of training examples = 1080)
Y_train -- test set, of shape (output size = 6, number of training examples = 1080)
X_test -- training set, of shape (input size = 12288, number of training examples = 120)
Y_test -- test set, of shape (output size = 6, number of test examples = 120)
learning_rate -- learning rate of the optimization
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 100 epochs
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
tf.set_random_seed(1) # to keep consistent results
seed = 3 # to keep consistent results
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
# Create Placeholders of shape (n_x, n_y)
### START CODE HERE ### (1 line)
X, Y = create_placeholders(n_x, n_y)
### END CODE HERE ###
# Initialize parameters
### START CODE HERE ### (1 line)
parameters = initialize_parameters()
### END CODE HERE ###
# Forward propagation: Build the forward propagation in the tensorflow graph
### START CODE HERE ### (1 line)
Z3 = forward_propagation(X, parameters)
### END CODE HERE ###
# Cost function: Add cost function to tensorflow graph
### START CODE HERE ### (1 line)
cost = compute_cost(Z3, Y)
### END CODE HERE ###
# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer.
### START CODE HERE ### (1 line)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
### END CODE HERE ###
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
	# Run the initialization
	sess.run(init)
	# Do the training loop
	for epoch in range(num_epochs):
		epoch_cost = 0. # Defines a cost related to an epoch
		num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
		seed = seed + 1
		minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
		for minibatch in minibatches:
			# Select a minibatch
			(minibatch_X, minibatch_Y) = minibatch
			# IMPORTANT: The line that runs the graph on a minibatch.
			# Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y).
			### START CODE HERE ### (1 line)
			_ , minibatch_cost = sess.run([optimizer,cost],feed_dict = {X:minibatch_X, Y:minibatch_Y})
			### END CODE HERE ###
			epoch_cost += minibatch_cost / num_minibatches
		# Print the cost every epoch
		if print_cost == True and epoch % 100 == 0:
			print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
		if print_cost == True and epoch % 5 == 0:
			costs.append(epoch_cost)
	# plot the cost
	plt.plot(np.squeeze(costs))
	plt.ylabel('cost')
	plt.xlabel('iterations (per tens)')
	plt.title("Learning rate =" + str(learning_rate))
	plt.show()
	# lets save the parameters in a variable
	parameters = sess.run(parameters)
	print ("Parameters have been trained!")
	# Calculate the correct predictions
	correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
	# Calculate accuracy on the test set
	accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
	print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
	print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
	return parameter

parameters = model(X_train, Y_train, X_test, Y_test)

Cost after epoch 0: 1.917426
Cost after epoch 100: 1.725384
Cost after epoch 200: 1.626670
Cost after epoch 300: 1.528626
Cost after epoch 400: 1.427116
Cost after epoch 500: 1.318802
Cost after epoch 600: 1.203437
Cost after epoch 700: 1.106668