机器学习入门（五）回归算法实战一

最新推荐文章于 2024-01-09 12:18:03 发布

__Fang Wei__

最新推荐文章于 2024-01-09 12:18:03 发布

阅读量582

点赞数 1

分类专栏：机器学习文章标签：机器学习逻辑回归实战

本文链接：https://blog.csdn.net/rookie_wei/article/details/83479700

版权

机器学习专栏收录该内容

12 篇文章 5 订阅

订阅专栏

--------韦访 20181023

1、概述

这一讲，我们来学习怎么使用逻辑回归解决实际问题。假设现在我们有100个学生两次考试成绩以及是否通过考核的历史数据，我们来建个逻辑回归的模型来对以后学生的考试成绩进行预测是否通过。

2、查看数据

示例数据的下载链接为：https://download.csdn.net/download/rookie_wei/10749893

我们先来查看原始数据文件，

数据只有3列，共100行，前两列分别代表两次考试的成绩，第三列表示是否通过考核，0表示不通过，第一行就直接是数据了，没有表头。为了更直观的观察数据，我们可以将其用坐标系画出来。代码如下，

#encoding:utf-8
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#因为第一行就是数据了，所有，要指定header参数为None
#为了方便后续数据处理，再给数据加个表头names
lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
print(lrData.shape)

#将通过的数据和不通过考核的数据分别提出了
passData = lrData[lrData['PASS'] == 1]
notPassData = lrData[lrData['PASS'] == 0]

#将数据画出来
xy = plt.subplot()
xy.scatter(passData['Score1'], passData['Score2'], c='r', marker='o', label='Pass')
xy.scatter(notPassData['Score1'], notPassData['Score2'], c='b', marker='x', label='Not Pass')
xy.legend()
xy.set_xlabel('Score1')
xy.set_ylabel('Score2')
plt.show()

运行结果，

3、模型构思

由上一讲可知，我们训练的目的，是为了求出三个θ参数（θ1、θ2、θ3）和一个阈值，根据阈值判断结果是否通过。

我们需要完成的模块如下：

Sigmoid：将值转换到概率上的函数

Model：返回预测结果

Cost：根据参数计算损失

Gradient：计算每个参数的梯度方向

Descent：更新参数

Accuracy：计算精度

4、代码实现模块

Sigmoid函数：

公式如下，

代码实现就很简单了，

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

Model函数：

公式如下，

这个代码也很简单，传入X和θ参数，再带入上面的sigmoid函数即可，代码如下，

def model(X, theta):
    return sigmoid(np.dot(theta.T, X))

损失函数：

公式如下，

为将上式转为梯度下降，引入，

为了让代码更清晰明了，我们可以先求

再求

最后再求和，代码如下，

def cost(X, y, theta):
    left = np.multiply(y, np.log(model(X, theta)))
    right = np.multiply(1 - y, np.log(1 - model(X, theta)))
    return - np.sum(left + right) / len(X)

计算梯度：

对平均损失函数求偏导，得到式子如下，

实现代码如下，

#计算梯度的函数
def gradient(X, y, theta):
    grad = np.zeros(theta.shape)
    error = (model(X, theta) - y).ravel()
    #求所有theta的梯度
    for i in range(len(theta.ravel())):
        term = np.multiply(error, X[:,i])
        grad[0, i] = np.sum(term) / len(X)
    return grad

洗牌函数：

什么叫洗牌函数呢？其实就是随机打乱数据，跟打牌时的洗牌动作一样，这样是为了避免我们获取（或者爬取）的数据因为某些原因而使训练结果不够好。比如，如果按成绩排名获取的数据，那么就有可能导致某些batch的成绩都非常好，某些batch的成绩很差，所以把它们洗牌（打乱）以后再训练。代码如下，

#洗牌
def shuffle(data):
    np.random.shuffle(data)
    columns = data.shape[1]
    X = data[:, 0:columns-1]
    Y = data[:, columns-1:]
    return X, Y

可以写个例子试试有没有起到“洗牌”的效果，代码如下，

data = np.array([
    [1, 2, 3],
    [11, 12, 13],
    [21, 22, 23],
    [31, 32, 33],
    [41, 42, 43],
])

x, y = shuffle(data)
print(x)
print('------')
print(y)

运行结果，

梯度下降：

上一讲中有提到过，梯度小下降法有三种，分别是：批量梯度下降法、随机梯度下降法、小批量梯度下降法。等下我们对比一下这三种梯度下降法的结果，我们总不能让训练无休止的进行下去吧？所以我们也要有相关的停止策略，一般分为三种：设定迭代次数、根据损失值停止训练、根据梯度变化停止训练。

#停止策略
#设定迭代次数
STOP_COUNT = 1
#根据损失值停止训练
STOP_LOSS = 2
#根据梯度变化停止训练
STOP_GRAD = 3
def stopTrain(type, value, threshold):
    if type == STOP_COUNT:
        return value > threshold
    elif type == STOP_LOSS:
        return abs(value[-1] - value[-2]) < threshold
    elif type == STOP_GRAD:
        return np.linalg.norm(value) < threshold
    else:
        return False

#梯度下降法
#data : 数据  theta：系数  stopType：停止策略  batchSize：batch大小
#threshold：阈值  learning_rate：步长/学习率
def gradientDescent(data, theta, stopType, batchSize, threshold, learning_rate):
    #迭代次数
    iterTime = 0
    #batch 下标
    batchIndex = 0
    #对数据洗牌
    X, y = shuffle(data)
    #损失
    errors = [cost(X, y, theta)]

    while True:
        #计算梯度
        grad = gradient(X[batchIndex:batchIndex+batchSize], y[batchIndex:batchIndex+batchSize], theta)

        #更新参数
        theta = theta - learning_rate * grad
        #调整下标
        batchIndex += batchSize
        #迭代次数加一
        iterTime += 1
        #记录下损失值
        errors.append(cost(X, y, theta))
        if batchIndex >= len(X):
            batchIndex = 0
            # 重新对数据洗牌
            X, y = shuffle(data)

        if stopType == STOP_COUNT:
            value = iterTime
        elif stopType == STOP_LOSS:
            value = errors
        elif stopType == STOP_GRAD:
            value = grad
        else:
            stopType = STOP_COUNT
            value = iterTime

        if stopTrain(stopType, value, threshold):
            break
    return theta, iterTime, errors

#开始训练
def train(data, theta, stopType, batchSize, threshold, learning_rate):
    theta, iterTime, errors = gradientDescent(data, theta, stopType, batchSize, threshold, learning_rate)
    print('theta:{}'.format(theta))
    print('iterTime:{}'.format(iterTime))
    print('last loss:{}'.format(errors[-1]))
    xy = plt.subplot()
    xy.plot(np.arange(len(errors)), errors, 'r')
    xy.set_xlabel('Iter Last Loss:{}'.format(errors[-1]))
    xy.set_ylabel('Cost')
    plt.show()

5、不同停止策略比较

主要模块都完成了，现在先来比较一下不同的停止策略之间有什么差别。为了排除不同梯度下降法的影响，我们采用相同的梯度下降法，这里采取批量梯度下降法来测试，即batch等于全部数据（batchSize=100）。

设定迭代次数：

指定迭代次数为10000，代码如下，

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)

运行结果：

可以看到，损失大概在0.585002。

根据损失值停止训练：

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)

运行结果，

可以看到，这次迭代了109902次，损失大概在0.376925。比上次的损失小了很多。如果继续训练，损失应该还会更小的。

根据梯度变化停止训练：

代码如下：

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)

运行结果：

这次运行了1078181次，损失降到了0.222549.

那么，再来对比一下不同的梯度下降法的结果，为了对比各个梯度下降法的运行时间，修改一下train函数，打印一下运行时间，修改后的代码如下，

import time
#开始训练
def train(data, theta, stopType, batchSize, threshold, learning_rate):
    cur_time = time.time()
    theta, iterTime, errors = gradientDescent(data, theta, stopType, batchSize, threshold, learning_rate)
    sub_time = time.time() - cur_time
    print('theta:{}'.format(theta))
    print('iterTime:{}'.format(iterTime))
    print('sub_time:{}'.format(sub_time))
    print('last loss:{}'.format(errors[-1]))
    xy = plt.subplot()
    xy.plot(np.arange(len(errors)), errors, 'r')
    xy.set_xlabel('Iter Last Loss:{}'.format(errors[-1]))
    xy.set_ylabel('Cost')
    plt.show()

批量梯度下降法：

代码如下：

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)

运行结果：

损失大概为0.387388，用时14.78秒.

随机梯度下降法：

代码如下，

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)
    train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=1, threshold=100000, learning_rate=0.001)

可以看到，随机梯度下降波动很大，最后的结果也不好。我们可以将学习率调小一点看看，代码如下，

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)
    train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=1, threshold=100000, learning_rate=0.000001)

运行结果，

比上面的结果要好很多了，随机梯度下降法的特点就是训练速度很快，但是稳定性就比较差，而且需要很小的学习率。

小批量梯度下降法：

代码如下：

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=1, threshold=100000, learning_rate=0.001)
    train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=30, threshold=100000, learning_rate=0.001)

运行结果：

用时比批量梯度下降法少了大约一般，比随机梯度下降法用时稍多，但是损失也存在比较大的波动。

我们尝试将数据进行预处理，将数据按其属性（按列）减去其均值，然后除以其方差，最后的结果是，对于每个属性（每列）来说，所有的数据都聚集在0附近，方差为1。

我们先得安装一下sklearn库，执行以下命令，

sudo pip install sklearn

然后，代码如下，

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    #预处理数据，第一列恒为1，所以不用预处理
    lrData[:, 1:3] = pp.scale(lrData[:, 1:3])
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=1, threshold=100000, learning_rate=0.000001)
    train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=30, threshold=100000, learning_rate=0.001)

运行结果，

这下曲线就很平滑了，而且，在很短时间内，迭代次数也比较小的情况下，得到了很不错的损失了。只迭代了十万次，得到的损失比上面迭代一百万次还小，而且时间也是远远小于上面的训练。可见，对数据进行预处理是非常高效的。

6、预测精度

最后，我们来看看模型训练的精度如何。使用上面小批量梯度下降法，且做了数据预处理的方法，代码如下，

#设定阈值
def predict(X, theta):
    return [1 if x >= 0.5 else 0 for x in model(X, theta)]

def test(X, theta):
    tmpX = X[:,:3]
    tmpY = X[:,3]
    pred = predict(tmpX, theta)
    correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(pred, tmpY)]
    accuracy = (sum(map(int, correct)) % len(correct))
    print('accuracy = {0}%'.format(accuracy))

修改一下train函数，让它返回theta值，代码如下，

#开始训练
def train(data, theta, stopType, batchSize, threshold, learning_rate):
    cur_time = time.time()
    theta, iterTime, errors = gradientDescent(data, theta, stopType, batchSize, threshold, learning_rate)
    sub_time = time.time() - cur_time
    print('theta:{}'.format(theta))
    print('iterTime:{}'.format(iterTime))
    print('sub_time:{}'.format(sub_time))
    print('last loss:{}'.format(errors[-1]))
    xy = plt.subplot()
    xy.plot(np.arange(len(errors)), errors, 'r')
    xy.set_xlabel('Iter Last Loss:{}'.format(errors[-1]))
    xy.set_ylabel('Cost')
    # plt.show()
    return theta

然后，调用test函数，

if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    scaled_data = lrData.copy()
    #预处理数据，第一列恒为1，所以不用预处理
    lrData[:, 1:3] = pp.scale(lrData[:, 1:3])
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=1, threshold=100000, learning_rate=0.000001)
    theta = train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=30, threshold=100000, learning_rate=0.001)
    test(lrData, theta)

运行结果，

准确率为90% 。我试着将训练次数增加到100万，或者减小学习率，损失最低降到了0.203497，可能是训练的数据两不够大的原因吧，比较只有100个数据供训练。

完整代码如下，

#encoding:utf-8
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import time
from sklearn import preprocessing as pp

#sigmoid函数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

#model函数
def model(X, theta):
    return sigmoid(np.dot(X, theta.T))

#损失函数cost
def cost(X, y, theta):
    left = np.multiply(y, np.log(model(X, theta)))
    right = np.multiply(1 - y, np.log(1 - model(X, theta)))
    return -np.sum(left + right) / len(X)

#计算梯度的函数
def gradient(X, y, theta):
    grad = np.zeros(theta.shape)
    error = (model(X, theta) - y).ravel()
    #求所有theta的梯度
    for i in range(len(theta.ravel())):
        term = np.multiply(error, X[:,i])
        grad[0, i] = np.sum(term) / len(X)
    return grad


#洗牌
def shuffle(data):
    np.random.shuffle(data)
    columns = data.shape[1]
    X = data[:, 0:columns-1]
    Y = data[:, columns-1:]
    return X, Y
#
# data = np.array([
#     [1, 2, 3],
#     [11, 12, 13],
#     [21, 22, 23],
#     [31, 32, 33],
#     [41, 42, 43],
# ])
#
# x, y = shuffle(data)
# print(x)
# print('------')
# print(y)

#停止策略
#设定迭代次数
STOP_COUNT = 1
#根据损失值停止训练
STOP_LOSS = 2
#根据梯度变化停止训练
STOP_GRAD = 3


def stopTrain(type, value, threshold):
    if type == STOP_COUNT:
        return value > threshold
    elif type == STOP_LOSS:
        return abs(value[-1] - value[-2]) < threshold
    elif type == STOP_GRAD:
        return np.linalg.norm(value) < threshold
    else:
        return False



#梯度下降法
#data : 数据  theta：系数  stopType：停止策略  batchSize：batch大小
#threshold：阈值  learning_rate：步长/学习率
def gradientDescent(data, theta, stopType, batchSize, threshold, learning_rate):
    #迭代次数
    iterTime = 0
    #batch 下标
    batchIndex = 0
    #对数据洗牌
    X, y = shuffle(data)
    #损失
    errors = [cost(X, y, theta)]

    while True:
        #计算梯度
        grad = gradient(X[batchIndex:batchIndex+batchSize], y[batchIndex:batchIndex+batchSize], theta)

        #更新参数
        theta = theta - learning_rate * grad
        #调整下标
        batchIndex += batchSize
        #迭代次数加一
        iterTime += 1
        #记录下损失值
        errors.append(cost(X, y, theta))
        if batchIndex >= len(X):
            batchIndex = 0
            # 重新对数据洗牌
            X, y = shuffle(data)

        if stopType == STOP_COUNT:
            value = iterTime
        elif stopType == STOP_LOSS:
            value = errors
        elif stopType == STOP_GRAD:
            value = grad
        else:
            stopType = STOP_COUNT
            value = iterTime

        if stopTrain(stopType, value, threshold):
            break
    return theta, iterTime, errors

#开始训练
def train(data, theta, stopType, batchSize, threshold, learning_rate):
    cur_time = time.time()
    theta, iterTime, errors = gradientDescent(data, theta, stopType, batchSize, threshold, learning_rate)
    sub_time = time.time() - cur_time
    print('theta:{}'.format(theta))
    print('iterTime:{}'.format(iterTime))
    print('sub_time:{}'.format(sub_time))
    print('last loss:{}'.format(errors[-1]))
    xy = plt.subplot()
    xy.plot(np.arange(len(errors)), errors, 'r')
    xy.set_xlabel('Iter Last Loss:{}'.format(errors[-1]))
    xy.set_ylabel('Cost')
    # plt.show()
    return theta

#设定阈值
def predict(X, theta):
    return [1 if x >= 0.5 else 0 for x in model(X, theta)]

def test(X, theta):
    tmpX = X[:,:3]
    tmpY = X[:,3]
    pred = predict(tmpX, theta)
    correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(pred, tmpY)]
    accuracy = (sum(map(int, correct)) % len(correct))
    print('accuracy = {0}%'.format(accuracy))
#
if __name__ == '__main__':
    #因为第一行就是数据了，所有，要指定header参数为None
    #为了方便后续数据处理，再给数据加个表头names
    lrData = pd.read_csv('LogiReg_data.csv', header=None, names=['Score1', 'Score2', 'PASS'])
    #因为theta0只是一个偏置量，所以，相当于x0=1，所以要插入一个值为1的列
    lrData.insert(0, 'Ones', 1)
    #将数据转为矩阵的格式
    lrData = lrData.as_matrix()
    scaled_data = lrData.copy()
    #预处理数据，第一列恒为1，所以不用预处理
    lrData[:, 1:3] = pp.scale(lrData[:, 1:3])
    #初始化theta
    theta = np.zeros([1, 3])
    #开始训练并显示出损失
    #设定迭代次数
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=10000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_LOSS, batchSize=100, threshold=0.000001, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_GRAD, batchSize=100, threshold=0.005, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=100000, learning_rate=0.001)
    # train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=1, threshold=100000, learning_rate=0.000001)
    theta = train(lrData, theta=theta, stopType=STOP_COUNT, batchSize=100, threshold=1000000, learning_rate=0.001)
    test(lrData, theta)

如果您感觉本篇博客对您有帮助，请打开支付宝，领个红包支持一下，祝您扫到99元，谢谢～～