逻辑回归简单练手及遇到的问题记录

_Warren

已于 2022-03-09 15:20:13 修改

阅读量1k

点赞数

文章标签：逻辑回归算法机器学习 python

于 2022-03-04 19:34:25 首次发布

本文链接：https://blog.csdn.net/qq_45155983/article/details/123281207

版权

该博客介绍了如何使用梯度下降和高级优化算法（如fmin_tnc）来预测大学生的录取情况。通过读取数据、数据可视化、提取特征并初始化参数，然后分别实现手动梯度下降和调用高级优化算法进行模型训练。在手动梯度下降中，由于未对数据进行归一化处理，导致了计算错误，但经过数据预处理后，两种方法得到了一致的预测结果。文章强调了数据预处理的重要性，并展示了两种方法的预测曲线，证明了它们的有效性。

摘要由CSDN通过智能技术生成

目的：根据分数预测某个大学生是否能够被大学录取。

方法：调用高级优化算法或者自行编写梯度下降函数（自行选择学习率和迭代次数）

数据：ex2data1.txt

一、读入数据。

1.1 引入工具包

import numpy as np
import pandas as pd
import scipy.optimize as opt
import matplotlib.pyplot as plt

1.2 导入数据

 path = '../data/exc_2/ex2data1.txt'
 data = pd.read_csv(path, header=None, names=["Exam 1", 'Exam 2', "Admitted"])
 print(data.head())

1.3 数据可视化展示：


positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]

fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50,c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()

1.4 提取数据x，y，并初始化theta

data.insert(0, "Ones", 1)
x = np.matrix(data2.iloc[:, :-1])
y = np.matrix(data2.iloc[:, -1])
y = y.reshape(y.shape[1], 1)
theta = np.matrix(np.zeros(3))

二、高级优化算法

若我们直接调用高级优化函数，可以很方便的得到最小值，同时不需要指定迭代次数和学习率。只需要编写计算梯度和代价函数即可。（注意：不更新theta，仅仅编写计算梯度的函数）

根据公式：

代价函数：

梯度：

这里之所以是这样的形式，是因为我们的h_theta（x）是sigmoid函数。

也就是 g(z) = 1 / (1 + e^z)

所以我们要定义下列函数：

1. sigmoid

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

2. 计算梯度

def gradient2(theta, x, y):
    theta = np.matrix(theta)
    x = np.matrix(x)
    y = np.matrix(y)
    parameters = int(theta.shape[1])
    # print(parameters)
    gap = sigmoid(x * theta.T) - y
    grad = np.zeros(parameters)
    for i in range(parameters):
        term = np.multiply(gap, x[:, i])
        grad[i] = np.sum(term) / len(x)
    # term = np.multiply(gap, x)
    # print(term)
    # print("term的形状：", term.shape)
    return grad

3. 代价函数

def cost_tmp(theta, x, y):
    theta = np.matrix(theta)
    x = np.matrix(x)
    y = np.matrix(y)
    term = np.multiply(-y, np.log(sigmoid(x * theta.T))) - np.multiply((1 - y), np.log(1 - sigmoid(x * theta.T)))
    return np.sum(term) / len(x)

直接调用fmin_tnc可获得下列结果：

result = opt.fmin_tnc(func=cost_tmp, x0=theta, fprime=gradient2, args=(x, y))
print("the func cost is: ", cost_tmp(result[0], x, y))
print("the func g is: ", result[0])

将其可视化：

    plotting_x1 = np.linspace(30, 100, 100)
    plotting_h1 = (- result[0][0] - result[0][1] * plotting_x1) / result[0][2]

    fig, ax = plt.subplots(figsize=(12, 8))
    ax.plot(plotting_x1, plotting_h1, 'y', label='Prediction')
    ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
    ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
    ax.legend()
    ax.set_xlabel('Exam 1 Score')
    ax.set_ylabel('Exam 2 Score')
    plt.show()

------------------------------------- 下面是手动实现梯度下降的方法（和遇到的问题） ------------------------------------

cost函数和sigmoid和上面一样。不同的是，在计算梯度时，我们要更新theta的值

根据图的公式，可写出下列代码。

要注意这里的 h_theta（x）与之前线性回归的不一样。

def TestDecend(theta, x, y, times, alpha):
    # 这里是 梯度下降，那么我们 应用公式：
    # theta_0: 应该是： theta_0 = theta_0 - (alpha * ((差值 * 1)) ) /m       （m是样本个数）
    # theta_1: theta_1 = theta_1 - ( alpha * (差值 * 对应的 x_i) ) / m
    numoftheta = int(theta.shape[1])
    temp = np.matrix(np.zeros(theta.shape))
    ct = np.zeros(times)

    for i in range(times):
        # 计算差值
        error = sigmoid(x * theta.T) - y  # 在此theta下，得出的预测值与 真实值y 的 差值。
        # 开始计算theta_0, theta_1.
        for j in range(numoftheta):  # 参数的个数
            # 开始更新 theta
            term = np.multiply(error, x[:, j])  # 上述公式。差值 * 1 或者 差值 * x_i
            temp[0, j] = theta[0, j] - (alpha * np.sum(term)) / len(x)

        theta = temp
        ct[i] = cost(theta, x, y)
        # print("此次迭代的cost更新为：", cost[i])
    return theta, ct

如果我们直接，不对数据处理，调用自己编写的梯度下降。即：

    times = 15000
    alpha = 0.08
    g, c = TestDecend(theta, x, y, times, alpha)  # 之前遇到的问题就是没有办法 和上述的实现一样的功能。原因就是没有归一化

    g = np.matrix.tolist(g)
    print("the testDescen g is: ", g, type(g))

    print("the cost is: ", c)

会出现，在cost计算的时候，除数为零。导致的错误，且可以看到cost会出现nan值。

我们观察最终的曲线和数据的拟合情况。是完全有偏差的（欠拟合的）：

当时，这个问题困惑了我很久。按道理来说我们的梯度下降函数并没有问题。怎么会出现这种情况呢？

后面，注意到我们data读入的时候，exam1 和 exam2 数据的范围是有差别的。所以我们需要将其归一化。所以，读入数据时，我们要这样处理。

#  若要实现手动的梯度下降，一定要将数据归一化。（admitted除外，不然会出现 除数为零，或者完全不能拟合的情况）
data2.iloc[:, :-1] = (data2.iloc[:, :-1] -     data2.iloc[:, :-1].mean()) / data2.iloc[:, :-1].std()
print(data2.head())

我们再调用一次，对比他和高级优化函数的结果是否一致：

    result = opt.fmin_tnc(func=cost_tmp, x0=theta, fprime=gradient2, args=(x, y))
    print("the func cost is: ", cost_tmp(result[0], x, y))
    print("the func g is: ", result[0])

    #
    times = 15000
    alpha = 0.08
    g, c = TestDecend(theta, x, y, times, alpha)  # 之前遇到的问题就是没有办法 和上述的实现一样的功能。原因就是没有归一化

    g = np.matrix.tolist(g)
    print("the testDescen g is: ", g, type(g))

    print("the cost is: ", c)

可以发现这样就大致相同了。同时我们将数据可视化。

可以看见，拟合程度与高级优化算法的效果一致。

3.9日补充：在计算gradient的时候，用了循环，但是其实，可以不需要，我们直接将其向量化。利用线性代数和传播的机制。可把gradient函数改写成如下：

def gradientDecVectorize(theta, x, y):
    theta = np.matrix(theta)
    x = np.matrix(x)
    y = np.matrix(y)
    parameters = int(theta.shape[1])
    grad = np.zeros(parameters)
    error = sigmoid(x * theta.T) - y
    grad = (error.T * x) / len(x)
    return grad