DeepLearning学习(一)：二分类的Logistic回归

最新推荐文章于 2023-12-27 14:06:55 发布

hanker.han

最新推荐文章于 2023-12-27 14:06:55 发布

阅读量981

点赞数 1

分类专栏：机器学习文章标签： python 机器学习 DeepLearning Logistic回归

本文链接：https://blog.csdn.net/hanker_han/article/details/91661038

版权

机器学习专栏收录该内容

1 篇文章 1 订阅

订阅专栏

DeepLearning学习(一)：二分类的Logistic回归

Logistic回归是一种广义线性回归（generalized linear model），因此与多重线性回归分析有很多相同之处。实际中最为常用的就是二分类的Logistic回归。Logistic回归的目的是使训练数据与其预测值之间的误差最小化。

这次就以Cat（y=1） vs No-cat（y=0）为例：给定一张图片（多维数据），使用一个算法来估计这张图中存在一只猫的概率，即y=1的概率：

给定 x,计算y^=P(y=1|x),且 0 ≤ y^ ≤ 1我们希望能有一个函数，能够表示出y^，如果进行最简单的线性拟合的话，可得到下列函数：（其中w为(nx,1)维数组和x为(nx,m)维数组）
在这里插入图片描述
从上述函数得出的y^可能非常大，还可能为负值。但是，我们实际需要的值只需要在0,1之间。这时，便需要一个sigmoid函数来对它的值域进行约束，sigmoid函数的表达式为：

其函数图像为：

从上图可以看到
当ｔ很大的时候，sig(t)的输出值趋向于1
当ｔ很小的时候，sig(t)的输出值趋向于0
当ｔ=0的时候，sig(t)的输出值=0.5
同时，我们也可以发现，当t越来越大或者越来越小的时候，曲线是趋向于直线的，也就是说当t越往后，梯度会变得越来越小，这样会大大增加我们梯度下降算法的运算时间，这点在之后的学习算法中需要注意一下。
因此，我们可以使用sigmoid函数来限制y^ 的输出值：
在这里插入图片描述

成本函数（Cost Function）

我们需要怎么做才能尽可能准确预测给的图片是否为猫呢？换句话说，我们希望模型输出值y^ 与实际的结果y尽量相等。这时，就需要使用“成本函数”(也叫代价函数)作为衡量标准。
损失函数（Loss Function）用来衡量具体某一个预测值（y^(i)）与真实值（y(i)）之间的差异（错误）。平方误差（Square Loss）是一种常用的损失函数：
在这里插入图片描述
但在logistic回归中一般不使用这个损失函数，因为在训练参数过程中，使用这个损失函数将得到一个非凸函数，最终将存在很多局部最优解，这种情况下使用梯度下降（Gradient Descent）法无法找到最优解，如下图：
在这里插入图片描述　　
所以在logistic回归中，一般采用下面的log函数来计算成本函数：

从上式可以看出，成本函数实际上是训练集中训练样本的损失函数的平均值。我们的目的就是使得Ｊ尽可能小。因此，通过回归测试用例，我们可以找到特定的ｗ和ｂ来达成这个目的。此时，我们可以把逻辑回归视作是一个非常小型的神经网络。

梯度下降法

在成本函数的说明中我们讲到了，我们的目的就是找到恰当的ｗ和ｂ来使得J尽可能小。从下图中我们可以看到，J的最小值，也就是对应着该曲面的底点。
在这里插入图片描述
在空间坐标中以w，b为轴画出损失函数J(w,b)的三维图像，可知这个函数为一个凸函数。为了找到合适的参数，先将w和b赋一个初始值，正如图中的小红点。logistic回归中，通常将参数值初始化为0。梯度下降就是从起始点开始，试图在最陡峭的下降方向下坡，以便尽可能快地下坡到达最低点，这个下坡的方向便是此点的梯度值。
在这里插入图片描述
以上图为例，当w位于最优点右侧时，其随着宽度的增加，高度也在不断增加，因此斜率是正数，即dw>0。因此，在迭代过程中，w的值会不断减小（w:=w-adw）。
而当w位于最优点左侧时，其随着宽度的增加，高度确在不断减小，因此斜率是负数，即dw<0。因此，在迭代过程中，w的值会不断增加（w:=w-adw）。
在实际代价函数中，J是关于w和b的函数，所以实际上，每次的迭代过程如下：
在这里插入图片描述
dw，db以及z的计算可以参考下面的图片：

Python实现

#导入所需要的包
import numpy as np                     #numpy：Python科学计算中最重要的库
import matplotlib.pyplot as plt        #mathplotlib：Python画图的库
import h5py                            #h5py：Python与H5文件交互的库
import imageio                         #imageio：对图片文件进行读写操作的库
import skimage                         #skimage：数字图像处理的库

def load_dataset():
    """
       加载训练和测试数据集
       训练和测试数据说明：
       训练集文件：datasets/train_catvnoncat.h5
       测试集文件：datasets/test_catvnoncat.h5
       对于训练集的标签而言，对于猫，标记为1，否则标记为0。
       每一个图像的维度都是(num_px, num_px, 3)，其中，长宽相同，3表示是RGB图像。
       train_set_x_orig和test_set_x_orig中，包含_orig是由于我们稍候需要对图像进行预处理，预处理后的变量将会通过
       train_dataset["train_set_x"][:]和train_dataset["train_set_y"][:]命名为train_set_x和train_set_y。
       注：数据存储方式采用的是hdf5格式
   """
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # your test set labels

    classes = np.array(test_dataset["list_classes"][:])  # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

"""
train_set_x_orig中的每一个元素对于这一副图像，我们可以用如下代码将图像显示出来：
"""
index = 20    #index值可以改变，20为'non-cat' picture，改为25就会是'cat' picture
plt.imshow(train_set_x_orig[index])   #处理图像
plt.show()   #显示图片
"""
打印各个变量内容
"""
print(train_set_y)
print(train_set_y[:, index])
print(np.squeeze(train_set_y[:, index]))
print(classes[np.squeeze(train_set_y[:, index])])
print("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")
# y = [0], it's a 'non-cat' picture.

在这里插入图片描述

"""
这里我们可以看到训练集的样本数量，测试集的样本数量，以及每张图片的大小：
"""
m_train = train_set_x_orig.shape[0]
print("Number of training examples: m_train = " + str(m_train))
m_test = test_set_x_orig.shape[0]
print("Number of testing examples: m_test = " + str(m_test))
num_px = train_set_x_orig.shape[1]
print("Size of each image: num_px = " + str(num_px))

print("test_set_x_orig.shape = " + str(test_set_x_orig.shape))
# 训练集样本数：209, 结果集样本数：50, 图片大小：64

"""
接下来，我们需要对将每幅图像转为一个矢量，即矩阵的一列。
最终，整个训练集将会转为一个矩阵，其中包括num_px*numpy*3行，m_train列。
reshape函数中-1参数的意义：如果等于-1的话，Numpy会根据剩下的维度计算出数组的另外一个shape属性值
参考：https://www.cnblogs.com/onemorepoint/p/9099312.html
train_set_x_orig为(a,b,c,d)的矩阵，其中a为样本数量，b,c,d为每一张图片的维度
train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)之后变为了(b*c*d,a)
由于我们需要的结果为样本列的图片参数矩阵，因此在reshape的基础上还需要做一下转置
train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T，变为维度是(a,b*c*d)的矩阵
"""
train_set_x_orig_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_orig_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

"""
接下来，我们需要对图像值进行归一化。
由于图像的原始值在0到255之间，最简单的方式是直接除以255即可。
"""
train_set_x = train_set_x_orig_flatten / 255
test_set_x = test_set_x_orig_flatten / 255

"""
接下来，我们将按照如下步骤来实现Logistic：
1. Define the model structure 定义模型结构
2. Initialize the model's parameters 初始化模型参数（这里我们会将w和b初始化为0）
3. Loop 开始循环调用每个参数样本
    3.1 Calculate current loss(forward propagation) 向前传播，计算成本函数（训练样本的损失函数的平均值）
    3.2 Calculate current gradient(backward propagation) 反向传递，计算梯度dw和db
    3.3 Update parameters 通过上一步计算出的dw和db更新参数，w=w-adw，b=b-adb
4. Integrate them into one function we call model() 整合成一个模型函数，供外面调用
"""

"""
Step1：实现sigmod函数
"""
def sigmoid(z):
    """
    实现sigmoid函数
    参数：z
    公式参见文档里sigmoid函数的表达式
    """
    s = 1 / (1 + np.exp(-z))
    return s

"""
Step2：初始化模型参数
"""
def initialize_with_zero(dim):
    """
    对参数进行初始化，初始值为0
    :param dim: shape (dim, 1) for w
    :return:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    w = np.zeros((dim, 1))
    b = 0

    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))

    return w, b

"""
Step3：前向传播与反向传播
计算Y_hat,成本函数J以及dw，db
"""
def propagate(w, b, X, Y):
    """
    正向传播（计算cost function）和反向传播(反推出梯度)
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    - 公式参见文档，计算cost function，dw，db，w和b
    """
    # 首先需要知道样本数量m
    m = X.shape[1]

    # 正向传播（计算cost function）
    A = sigmoid(np.dot(w.T, X) + b)
    cost = -(np.dot(Y, np.log(A.T)) + np.dot(np.log(1 - A), (1 - Y).T)) / m

    # 反向传播（反推出传导率, dw和db）
    dw = np.dot(X, (A - Y).T) / m
    db = np.sum(A - Y) / m

    assert(dw.shape == w.shape)
    assert(db.dtype == float)

    # 压缩维度 降维
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    grads = {"dw": dw,
             "db": db
             }

    return grads, cost

"""
Step4：更新参数w和b,使用梯度下降找出梯度最下点
#num_iterations-梯度下降次数 learning_rate-学习率，即参数ɑ
#记录成本值 costs = [] 
"""
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    更新参数,梯度下降找出最优解(找出w,b)
    :param w: 参数w
    :param b: 参数b
    :param X: 样本
    :param Y: 结果集
    :param num_iterations: 梯度下降次数
    :param learning_rate: 学习率，即参数ɑ
    :param print_cost: 是否打印cost
    :return:
    params -- 记录w和b的字典
    grads -- 记录dw和db的字典
    costs -- 记录了每次计算出的cost值
    """
    costs = []

    for i in range(num_iterations):
        #计算出Cost和梯度
        grads, cost = propagate(w, b, X, Y)

        dw = grads["dw"]
        db = grads["db"]

        #计算w和b，公式参见笔记，文档
        w = w - learning_rate * dw
        b = b - learning_rate * db

        # 每100次记录一次成本值
        if i % 100 == 0:
            costs.append(cost)

        #打印成本值
        if print_cost and i % 100 == 0:  # 打印成本值
            print("Cost after iteration %i: %f" % (i, cost))
        params = {"w": w,
                  "b": b}
        grads = {"dw": dw,
                 "db": db}

    return params, grads, costs

"""
Step5：利用训练好的模型对测试集进行预测：
当输出结果大于0.5时，我们认为其预测结果是猫，否则不是猫。
"""
def predict(w, b, X):
    """
    :param w: 计算出的w值
    :param b: 计算出的b值
    :param X: 样本size (num_px * num_px * 3, number of examples)
    :return:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    """

    #样本数量
    m = X.shape[1]

    # 给Y_prediction size(1, m)赋初期值
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    # 计算输出值，公式参见笔记，文档
    A = sigmoid(np.dot(w.T, X) + b)

    # 循环样本数，输出结果
    for i in range(A.shape[1]):
        if A[0][i] <= 0.5:
            A[0][i] = 0
        else:
            A[0][i] = 1

    Y_prediction = A
    assert(Y_prediction.shape == (1, m))

    return  Y_prediction

"""
Step6：将以上功能整合到一个模型中：建立整个预测模型
"""
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost= False):
    """
    建立整个预测模型
    :param X_train: 训练样本输入X numpy array of shape (num_px * num_px * 3, m_train)
    :param Y_train: 训练样本结果Y numpy array (vector) of shape (1, m_train)
    :param X_test: 测试集输入X numpy array of shape (num_px * num_px * 3, m_test)
    :param Y_test: 测试集结果Y numpy array (vector) of shape (1, m_test)
    :param num_iterations: 梯度下降次数
    :param learning_rate: 学习率，即参数ɑ
    :param print_cost: 是否打印cost
    :return:
    返回结果字典d
    """

    w, b = initialize_with_zero(X_train.shape[0])

    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    w = parameters["w"]
    b = parameters["b"]

    # 测试集的预测结果
    Y_prediction_test = predict(w, b, X_test)
    # 训练集的预测结果
    Y_prediction_train = predict(w, b, X_train)

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))  # 训练集识别准确度
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))  # 测试集识别准确度

    d = {"costs": costs,
             "Y_prediction_test": Y_prediction_test,
             "Y_prediction_train" : Y_prediction_train,
             "w" : w,
             "b" : b,
             "learning_rate" : learning_rate,
             "num_iterations": num_iterations}

    return d

"""
测试一下该模型：
"""
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
print(str(d))

输出的结果为：
Cost after iteration 0: 0.693147
Cost after iteration 100: 0.584508
Cost after iteration 200: 0.466949
Cost after iteration 300: 0.376007
Cost after iteration 400: 0.331463
Cost after iteration 500: 0.303273
Cost after iteration 600: 0.279880
Cost after iteration 700: 0.260042
Cost after iteration 800: 0.242941
Cost after iteration 900: 0.228004
Cost after iteration 1000: 0.214820
Cost after iteration 1100: 0.203078
Cost after iteration 1200: 0.192544
Cost after iteration 1300: 0.183033
Cost after iteration 1400: 0.174399
Cost after iteration 1500: 0.166521
Cost after iteration 1600: 0.159305
Cost after iteration 1700: 0.152667
Cost after iteration 1800: 0.146542
Cost after iteration 1900: 0.140872
train accuracy: 99.04306220095694 %
test accuracy: 70.0 %
此时，观察打印结果，我们可以发现我们的测试准确率已经可以达到70.0%。
而对于训练集，其准确性达到了99%。
这表明了我们的模型有着一定的过拟合，不过不要着急，我们会在后续的内容中来解决这一问题。

"""
不同的学习速率会产生不同的cost和预测结果
如果学习速率太大（如：0.01）,cost结果可能会上下震荡，导致模型结果不稳定
但是小的学习速率又不一定是最好的，你要防止过拟合，也就像上面的例子那样，训练集的结果精度大于测试集的结果精度
因此我们需要寻找一个合适的学习速率，这点貌似很难。。。
"""
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

执行出来的结果：
learning rate is: 0.01
train accuracy: 99.52153110047847 %
test accuracy: 68.0 %

learning rate is: 0.001
train accuracy: 88.99521531100478 %
test accuracy: 64.0 %

learning rate is: 0.0001
train accuracy: 68.42105263157895 %
test accuracy: 36.0 %

y = 1.0, your algorithm predicts a “cat” picture.
在这里插入图片描述

"""
用一副你自己的图像，而不是训练集或测试集中的图像
来预测一下，这个图像是否是猫
ps：
这次学习的代码是在python3.6上运行的，吴恩达教授提供的样例可能是在python早期版本执行的
因此对于图片操作的一些地方可能会出现编译出错的问题，
原来是ndimage.imread和scipy.misc.imresize，我改成了imageio.imread和skimage.transform.resize
因此import的库也有略微变化，如果有朋友也遇到了相同的问题，可以参考我这里
"""
my_image = "LogReg_kiank.jpg"   # change this to the name of your image file

fname = "images/" + my_image
image = np.array(imageio.imread(fname))  #读取图片
my_image = skimage.transform.resize(image, (num_px,num_px)).reshape((1, num_px*num_px*3)).T #放缩图像
my_predicted_image = predict(d["w"], d["b"], my_image)  #预测

plt.imshow(image)
plt.show()
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" +
      classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

下图为LogReg_kiank.jpg

运行结果为：
y = 1.0, your algorithm predicts a “cat” picture.

参考资料

吴恩达-神经网络与深度学习-网易云课堂
https://blog.csdn.net/wangzj1205/article/details/77971488

本文算是本人学习吴恩达教授的DeepLearning系列课程的第一个学习笔记，仅供自己以后复习使用，并无其他目的。因为初次接触人工智能，文中可能有些地方写的有错误，如果朋友发现了还请不吝赐教。另外，本文中有很多地方引用了其他博主的博客，也给我自己的学习提供了很多帮助，在这里非常感谢。后续还会继续学习恩达教授的DeepLearning系列课程，以后应该还会继续更新学习博客。