吴恩达深度学习课程-Course 1 神经网络与深度学习第二周 Logistic回归编程作业

最新推荐文章于 2021-12-02 11:26:49 发布

Lucy@IshtarXu

最新推荐文章于 2021-12-02 11:26:49 发布

阅读量328

点赞数

分类专栏：深度学习 Python

本文链接：https://blog.csdn.net/FelicityXu/article/details/119544175

版权

深度学习同时被 2 个专栏收录

43 篇文章 18 订阅

订阅专栏

Python

32 篇文章 1 订阅

订阅专栏

暂更新assignment2-2部分,主要是记录作业思路和代码,参考吴恩达大佬的作业及网上各路大神的资料，后半部分代码注释有翻译，前半部分没来得及弄，之后抽空优化

文章目录

1 - Packages

numpy 是使用科学计算的基本包
h5py 是一个通用包,用于与存储在H5文件中的数据集进行交互
matplotlib 是一个著名的用Python绘制图形的库
PIL and scipy 用于用自己的图片来测试模型

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

%matplotlib inline

2 - 问题概述

Problem Statement: 您将获得一个数据集(“data.h5”),其中包含

标记为猫(y=1)或非猫(y=0)的m_train图像训练集
标记为猫或非猫的图像测试集
每个图像的形状为(num_px, num_px, 3),其中3表示三个通道(RGB).因此,每个图像都是正方形(高度=num_px)和(宽度=num_px)

您将构建一个简单的图像识别算法,该算法可以分类猫和非猫.
通过以下代码加载数据集:

# Loading the data (cat/non-cat) 加载数据集
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

其中,load_dataset函数为:

"""
  我们在图像数据集的末尾添加了"_orig",因为我们要对其进行预处理.
  预处理后,我们将得到train_set_x和test_set_x(标签train_set_y)和标签    test_set_y不需要进行任何预处理
  train_dataset["train_set_x"][:] ===》命名为train_set_x和train_set_y。
  注意:数据存储方式采用的是hdf5格式
"""

def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r") #读取h5文件,训练集209张图片
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features 原始训练集,大小为209*64*64*3
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels 原始训练集标签集,y=0或者1(50*1)

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r") #读取测试集50张图片
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features 原始测试集(50*64*64*3)
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels 原始测试集的标签集(50*1)

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0])) #对训练集设置大小为(1*209)
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))  #对测试设置大小为(1*50)
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

我们可以通过以下代码来运行可视化示例

# Example of a picture
index = 25
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

输出结果:
在这里插入图片描述

练习: 找到以下值:
- m_train (number of training examples) 训练集数量
- m_test (number of test examples) 测试集数量
- num_px (= height = width of a training image) 图像大小
记住 train_set_x_orig 是一个形状为(m_train, num_px, num_px, 3)的numpy数组. 例如, 你可以编写 train_set_x_orig.shape[0]来访问m_train .

### START CODE HERE ### (≈ 3 lines of code)
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

输出结果为
在这里插入图片描述
方便起见, 你应该在形状为(num_px, num_px, 3) 的numpy数组里面重塑形状为 (num_px ∗ num_px ∗ 3, 1)的图像. 在此之后, 我们的训练(测试) 集是一个numpy数组,其中每一列代表一个展平的图像.

练习: 重塑训练集和测试集,以便将大小为(num_px, num_px, 3) 的图像展平为 (num_px ∗ num_px ∗ 3, 1)的单位向量.

当您想将形状为(a,b,c,d)的矩阵X展平为形状为(b * c * d, a)的矩阵X_flatten时,一个技巧是使用:

# 这里注意-1,参数-1就是不知道行数或者列数多少的情况下使用的参数，所以先确定除了参数-1之外的其他参数，然后通过(总参数的计算) / (确定除了参数-1之外的其他参数) = 该位置应该是多少的参数。
X_flatten = X.reshape(X.shape[0], -1).T      # X.T is the transpose of X

以下是练习代码:

# Reshape the training and test examples 重塑训练集和测试集

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
# train_set_x_orig =>(209,64,64,3)
# train_set_x_orig.reshape(train_set_x_orig.shape[0],-1) =>(209,64*64*3)
# 转置=> (64*64*3,209)
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
### END CODE HERE ###

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))

输出结果为
在这里插入图片描述
为了表示彩色图像，必须为每个像素指定RGB通道，因此像素值实际上是一个0-255之间的三个数字组成的向量。

机器学习中一个常见的预处理步骤是对数据集进行中心化和标准化，这意味着从每个示例中减去整个numpy数组的平均值，然后将每个示例除以整个numpy数组的标准差。但是对于图片数据集，将数据的每一行除以255(像素通道的最大值)更简单和更方便，效果也差不多。

让我们标准化我们的数据集。

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

【你需要记住的】：

处理新数据的常见步骤是：

找出问题的尺寸和形状(m_train, m_test, num_px, …)
重塑数据集，使每个示例现在都是大小为(num_px *
num_px * 3, 1)的向量
数据标准化

3 - General Architecture of the learning algorithm 学习算法的一般架构

在这个部分，将要使用神经网络思维方式构建逻辑回归。
在这里插入图片描述
算法的数学描述：
对于一个𝑥(𝑖):
(𝑎(𝑖),𝑦(𝑖))=−𝑦(𝑖)log(𝑎(𝑖))−(1−𝑦(𝑖))log(1−𝑎(𝑖))
然后通过对所有示例求和从而计算成本：
$\frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})$

【关键步骤】:

初始化模型参数
通过最小化成本来学习模型的参数
使用学习到的参数进行预测
分析结果并得出结论

4 - Building the parts of our algorithm 构建部分算法

构建神经网络的主要步骤为（建立model()函数并集成下述步骤）：

定义模型结构（例如输入特征的数量）
初始化模型参数
循环
计算当前损失（前项传播）
计算当前梯度（反向传播）
更新参数（梯度下降）

4.1 辅助函数

练习：使用代码实现sigmoid()函数
$w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = 1/(1+np.exp(-z))
    ### END CODE HERE ###
    
    return s

print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

输出结果为
在这里插入图片描述

4.2 初始化参数

练习：实现参数初始化，必须将w初始化为零向量。（注：使用np.zeros()）

# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    w = np.zeros((dim, 1))
    b = 0
    ### END CODE HERE ###

    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

输出结果为：
在这里插入图片描述

对于图像输入，w应为 (num_px × num_px × 3, 1)

4.3 前向传播和反向传播

现在参数已经初始化，可以执行向前和向后传播来学习参数。
练习：实现一个计算成本函数及梯度的函数propagate()
【提示】：
前向传播：

得到X
计算 $\sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
计算成本函数 $-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

以下是将使用的两个公式：
$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$
$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$

# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)前向传播
    ### START CODE HERE ### (≈ 2 lines of code)
    A = sigmoid(np.dot(w.T, X) + b)            # compute activation
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))         # compute cost
    ### END CODE HERE ###
    
    # BACKWARD PROPAGATION (TO FIND GRAD)反向传播
    ### START CODE HERE ### (≈ 2 lines of code)
    dw = 1/m * np.dot(X, (A - Y).T)
    db = 1/m * np.sum(A-Y)
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost) # squeeze 函数：从数组的形状中删除单维度条目，即把shape中为1的维度去掉
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

输出结果：
在这里插入图片描述

4.4 - 优化

您已初始化参数
您还可以计算成本函数及其梯度
现在,您想使用梯度下降更新参数

练习：写出优化函数。目标是通过最小化成本函数 𝐽 来学习 𝑤 和 𝑏 。对于参数 𝜃 ，更新规则是𝜃 = 𝜃 − 𝛼 𝑑𝜃 ，其中 𝛼 为学习率。

# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
   """
    此函数通过运行梯度下降算法来优化w和b
    
    参数：
        w  - 权重，大小不等的数组（num_px * num_px * 3，1）
        b  - 偏差，一个标量
        X  - 维度为（num_px * num_px * 3，训练数据的数量）的数组。
        Y  - 真正的“标签”矢量（如果非猫则为0，如果是猫则为1），矩阵维度为(1,训练数据的数量)
        num_iterations  - 优化循环的迭代次数
        learning_rate  - 梯度下降更新规则的学习率
        print_cost  - 每100步打印一次损失值
    
    返回：
        params  - 包含权重w和偏差b的字典
        grads  - 包含权重和偏差相对于成本函数的梯度的字典
        costs - 优化期间计算的所有成本列表，将用于绘制学习曲线。
    
    提示：
    我们需要写下两个步骤并遍历它们：
        1）计算当前参数的成本和梯度，使用propagate（）。
        2）使用w和b的梯度下降法则更新参数。
    """
    costs = []
    
    for i in range(num_iterations):
        
        
        # Cost and gradient calculation (≈ 1-4 lines of code)
        ### START CODE HERE ### 
        grads, cost = propagate(w,b,X,Y)
        ### END CODE HERE ###
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule (≈ 2 lines of code)
        ### START CODE HERE ###
        w = w - learning_rate * dw
        b = b - learning_rate * db
        ### END CODE HERE ###
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

输出结果为：
在这里插入图片描述
练习：前面的函数将输出w和b。我们可以使用w和b来预测数据集X的标签。实现predict()函数。计算预测有两个步骤：

计算 $\hat{Y} = A = \sigma(w^T X + b)$
将a的值变为0（如果激活函数的值<=0.5）或者1（如果激活函数的值>=0.5），然后将预测存储在向量Y_prediction中。如果您愿意，您可以在for/loop循环中使用if/else语句（尽管有一种方法可以矢量化）

# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    使用学习逻辑回归参数logistic （w，b）预测标签是0还是1，
    
    参数：
        w  - 权重，大小不等的数组（num_px * num_px * 3，1）
        b  - 偏差，一个标量
        X  - 维度为（num_px * num_px * 3，训练数据的数量）的数据
    
    返回：
        Y_prediction  - 包含X中所有图片的所有预测【0 | 1】的一个numpy数组（向量）
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ### (≈ 1 line of code)
    A = sigmoid( np.dot(w.T, X) + b )
    ### END CODE HERE ###

    for i in range(A.shape[1]):
        
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

print ("predictions = " + str(predict(w, b, X)))

在这里插入图片描述
What to remember: You’ve implemented several functions that:

Initialize (w,b) - Optimize the loss iteratively to learn parameters (w,b):
computing the cost and its gradient
updating the parameters using gradient descent
-Use the learned (w,b) to predict the labels for a given set of examples

5 - 将所有功能合并到一个模型中

现在，你将通过按正确顺序将所有构建块（在前面部分实现的功能）放在一起来了解整个模型的结构。
练习：实现模型功能。使用以下符号：

Y_prediction用于您对测试集的预测
Y_prediction_train用于您对训练集的预测
w, costs, grads为optimize()的输出

 # GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    通过调用之前实现的函数来构建逻辑回归模型
    
    参数：
        X_train  - numpy的数组,维度为（num_px * num_px * 3，m_train）的训练集
        Y_train  - numpy的数组,维度为（1，m_train）（矢量）的训练标签集
        X_test   - numpy的数组,维度为（num_px * num_px * 3，m_test）的测试集
        Y_test   - numpy的数组,维度为（1，m_test）的（向量）的测试标签集
        num_iterations  - 表示用于优化参数的迭代次数的超参数
        learning_rate  - 表示optimize（）更新规则中使用的学习速率的超参数
        print_cost  - 设置为true以每100次迭代打印成本
    
    返回：
        d  - 包含有关模型信息的字典。
    """
    
    ### START CODE HERE ###
    
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

训练模型

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

得到如下结果：
在这里插入图片描述
评论：训练准确率接近 100%。这是一个很好的完整性检查：您的模型正在运行并且具有足够大的容量来拟合训练数据。测试误差为 70%。考虑到我们使用的小数据集以及逻辑回归是一个线性分类器，对于这个简单的模型来说实际上还不错。不过不用担心，下周你会构建一个更好的分类器！

此外，您会看到该模型显然过度拟合了训练数据。在本专业的后面，您将学习如何减少过度拟合，例如使用正则化。

使用下面的代码（并更改索引变量），您可以查看对测试集图片的预测。

# Example of a picture that was wrongly classified.
index = 1
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

在这里插入图片描述
我们也可以绘制梯度和成本函数

# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

在这里插入图片描述
解释：您可以看到成本在下降。它表明正在学习参数。但是，您会发现您可以在训练集上进一步训练模型。尝试增加上面单元格中的迭代次数并重新运行单元格。您可能会看到训练集准确度上升，但测试集准确度下降。这称为过拟合。

6 - 进一步分析（可选/未分级练习）

恭喜您构建了第一个图像分类模型。让我们进一步分析它，并检查学习率 𝛼 的可能选择。

学习率的选择

提醒：为了使梯度下降起作用，您必须明智地选择学习率。学习率 𝛼 决定了我们更新参数的速度。 如果学习率太大，我们可能会“超调”最优值。同样，如果它太小，我们将需要太多的迭代才能收敛到最佳值。这就是为什么使用经过良好调整的学习率至关重要的原因。

让我们将模型的学习曲线与多种学习率选择进行比较。运行下面的单元格。这应该需要大约 1 分钟。也可以尝试与我们初始化的 learning_rates 变量包含的三个不同的值，看看会发生什么。

learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

在这里插入图片描述
解释：

不同的学习率会产生不同的成本，从而产生不同的预测结果。
如果学习率太大（0.01），成本可能会上下波动。它甚至可能会发散（尽管在此示例中，使用 0.01 最终仍会以良好的成本价值结束）。
较低的成本并不意味着更好的模型。您必须检查是否可能过度拟合。当训练精度远高于测试精度时，就会发生这种情况。
在深度学习中，我们通常建议您：
- 选择能够更好地最小化成本函数的学习率。
- 如果您的模型过度拟合，请使用其他技术来减少过度拟合。（我们将在后面的视频中讨论这一点。）

7 - 完整代码

由于是跟着作业在jupyter做的，有点偷懒，没有整合完整代码，所以这里贴一下大神链接的

import numpy as np
import matplotlib.pyplot as plt
import h5py
from lr_utils import load_dataset

train_set_x_orig , train_set_y , test_set_x_orig , test_set_y , classes = load_dataset()

m_train = train_set_y.shape[1] #训练集里图片的数量。
m_test = test_set_y.shape[1] #测试集里图片的数量。
num_px = train_set_x_orig.shape[1] #训练、测试集里面的图片的宽度和高度（均为64x64）。

#现在看一看我们加载的东西的具体情况
print ("训练集的数量: m_train = " + str(m_train))
print ("测试集的数量 : m_test = " + str(m_test))
print ("每张图片的宽/高 : num_px = " + str(num_px))
print ("每张图片的大小 : (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("训练集_图片的维数 : " + str(train_set_x_orig.shape))
print ("训练集_标签的维数 : " + str(train_set_y.shape))
print ("测试集_图片的维数: " + str(test_set_x_orig.shape))
print ("测试集_标签的维数: " + str(test_set_y.shape))

#将训练集的维度降低并转置。
train_set_x_flatten  = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
#将测试集的维度降低并转置。
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

print ("训练集降维最后的维度： " + str(train_set_x_flatten.shape))
print ("训练集_标签的维数 : " + str(train_set_y.shape))
print ("测试集降维之后的维度: " + str(test_set_x_flatten.shape))
print ("测试集_标签的维数 : " + str(test_set_y.shape))

train_set_x = train_set_x_flatten / 255
test_set_x = test_set_x_flatten / 255

def sigmoid(z):
    """
    参数：
        z  - 任何大小的标量或numpy数组。

    返回：
        s  -  sigmoid（z）
    """
    s = 1 / (1 + np.exp(-z))
    return s

def initialize_with_zeros(dim):
    """
        此函数为w创建一个维度为（dim，1）的0向量，并将b初始化为0。

        参数：
            dim  - 我们想要的w矢量的大小（或者这种情况下的参数数量）

        返回：
            w  - 维度为（dim，1）的初始化向量。
            b  - 初始化的标量（对应于偏差）
    """
    w = np.zeros(shape = (dim,1))
    b = 0
    #使用断言来确保我要的数据是正确的
    assert(w.shape == (dim, 1)) #w的维度是(dim,1)
    assert(isinstance(b, float) or isinstance(b, int)) #b的类型是float或者是int

    return (w , b)

def propagate(w, b, X, Y):
    """
    实现前向和后向传播的成本函数及其梯度。
    参数：
        w  - 权重，大小不等的数组（num_px * num_px * 3，1）
        b  - 偏差，一个标量
        X  - 矩阵类型为（num_px * num_px * 3，训练数量）
        Y  - 真正的“标签”矢量（如果非猫则为0，如果是猫则为1），矩阵维度为(1,训练数据数量)

    返回：
        cost- 逻辑回归的负对数似然成本
        dw  - 相对于w的损失梯度，因此与w相同的形状
        db  - 相对于b的损失梯度，因此与b的形状相同
    """
    m = X.shape[1]

    #正向传播
    A = sigmoid(np.dot(w.T,X) + b) #计算激活值，请参考公式2。
    cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A))) #计算成本，请参考公式3和4。

    #反向传播
    dw = (1 / m) * np.dot(X, (A - Y).T) #请参考视频中的偏导公式。
    db = (1 / m) * np.sum(A - Y) #请参考视频中的偏导公式。

    #使用断言确保我的数据是正确的
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    #创建一个字典，把dw和db保存起来。
    grads = {
                "dw": dw,
                "db": db
             }
    return (grads , cost)

def optimize(w , b , X , Y , num_iterations , learning_rate , print_cost = False):
    """
    此函数通过运行梯度下降算法来优化w和b

    参数：
        w  - 权重，大小不等的数组（num_px * num_px * 3，1）
        b  - 偏差，一个标量
        X  - 维度为（num_px * num_px * 3，训练数据的数量）的数组。
        Y  - 真正的“标签”矢量（如果非猫则为0，如果是猫则为1），矩阵维度为(1,训练数据的数量)
        num_iterations  - 优化循环的迭代次数
        learning_rate  - 梯度下降更新规则的学习率
        print_cost  - 每100步打印一次损失值

    返回：
        params  - 包含权重w和偏差b的字典
        grads  - 包含权重和偏差相对于成本函数的梯度的字典
        成本 - 优化期间计算的所有成本列表，将用于绘制学习曲线。

    提示：
    我们需要写下两个步骤并遍历它们：
        1）计算当前参数的成本和梯度，使用propagate（）。
        2）使用w和b的梯度下降法则更新参数。
    """

    costs = []

    for i in range(num_iterations):

        grads, cost = propagate(w, b, X, Y)

        dw = grads["dw"]
        db = grads["db"]

        w = w - learning_rate * dw
        b = b - learning_rate * db

        #记录成本
        if i % 100 == 0:
            costs.append(cost)
        #打印成本数据
        if (print_cost) and (i % 100 == 0):
            print("迭代的次数: %i ， 误差值： %f" % (i,cost))

    params  = {
                "w" : w,
                "b" : b }
    grads = {
            "dw": dw,
            "db": db } 
    return (params , grads , costs)

def predict(w , b , X ):
    """
    使用学习逻辑回归参数logistic （w，b）预测标签是0还是1，

    参数：
        w  - 权重，大小不等的数组（num_px * num_px * 3，1）
        b  - 偏差，一个标量
        X  - 维度为（num_px * num_px * 3，训练数据的数量）的数据

    返回：
        Y_prediction  - 包含X中所有图片的所有预测【0 | 1】的一个numpy数组（向量）

    """

    m  = X.shape[1] #图片的数量
    Y_prediction = np.zeros((1,m)) 
    w = w.reshape(X.shape[0],1)

    #计预测猫在图片中出现的概率
    A = sigmoid(np.dot(w.T , X) + b)
    for i in range(A.shape[1]):
        #将概率a [0，i]转换为实际预测p [0，i]
        Y_prediction[0,i] = 1 if A[0,i] > 0.5 else 0
    #使用断言
    assert(Y_prediction.shape == (1,m))

    return Y_prediction

def model(X_train , Y_train , X_test , Y_test , num_iterations = 2000 , learning_rate = 0.5 , print_cost = False):
    """
    通过调用之前实现的函数来构建逻辑回归模型

    参数：
        X_train  - numpy的数组,维度为（num_px * num_px * 3，m_train）的训练集
        Y_train  - numpy的数组,维度为（1，m_train）（矢量）的训练标签集
        X_test   - numpy的数组,维度为（num_px * num_px * 3，m_test）的测试集
        Y_test   - numpy的数组,维度为（1，m_test）的（向量）的测试标签集
        num_iterations  - 表示用于优化参数的迭代次数的超参数
        learning_rate  - 表示optimize（）更新规则中使用的学习速率的超参数
        print_cost  - 设置为true以每100次迭代打印成本

    返回：
        d  - 包含有关模型信息的字典。
    """
    w , b = initialize_with_zeros(X_train.shape[0])

    parameters , grads , costs = optimize(w , b , X_train , Y_train,num_iterations , learning_rate , print_cost)

    #从字典“参数”中检索参数w和b
    w , b = parameters["w"] , parameters["b"]

    #预测测试/训练集的例子
    Y_prediction_test = predict(w , b, X_test)
    Y_prediction_train = predict(w , b, X_train)

    #打印训练后的准确性
    print("训练集准确性："  , format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100) ,"%")
    print("测试集准确性："  , format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100) ,"%")

    d = {
            "costs" : costs,
            "Y_prediction_test" : Y_prediction_test,
            "Y_prediciton_train" : Y_prediction_train,
            "w" : w,
            "b" : b,
            "learning_rate" : learning_rate,
            "num_iterations" : num_iterations }
    return d

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

#绘制图
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

8 - 使用您自己的图像进行测试（可选/未分级练习）

将您的图像添加到此 Jupyter Notebook 的目录中的“images”文件夹中
在以下代码中更改您的图像名称
运行代码并检查算法是否正确（1 = 猫，0 = 非猫）

## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "cat_in_iran.jpg"   # change this to the name of your image file 
## END CODE HERE ##

# We preprocess the image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T
my_predicted_image = predict(d["w"], d["b"], my_image)

plt.imshow(image)
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")