机器学习之逻辑回归Logistic Regression原理实现升学预测——Python代码实现

最新推荐文章于 2023-03-05 20:55:23 发布

志存高远脚踏实地

最新推荐文章于 2023-03-05 20:55:23 发布

阅读量2.2k

点赞数 4

分类专栏：机器学习文章标签：机器学习之逻辑回归逻辑回归代码实现 Python代码实现逻辑回归

本文链接：https://blog.csdn.net/weixin_44451032/article/details/99705691

版权

机器学习专栏收录该内容

24 篇文章 8 订阅

订阅专栏

Logistic Regression

目标：建立一个逻辑回归模型，通过一个人的两门考试成绩来预测能否被该学校录取，最后计算准确率。
下面是本次使用的数据，如有需要学习请留言
在这里插入图片描述

准备数据

#导入模块
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#读取数据
data = pd.read_csv('data/LogiReg_data.csv')
data.head()

	34.62365962451697	78.0246928153624	0
0	30.286711	43.894998	0
1	35.847409	72.902198	0
2	60.182599	86.308552	1
3	79.032736	75.344376	1
4	45.083277	56.316372	0

可以看到，原始数据中并没有设置列名，接下来设置列名

#可以看到，原始数据中并没有设置列名，接下来设置列名
data = pd.read_csv('data/LogiReg_data.csv',names = ['Exam_one','Exam_two','Admitted'])
data.head()

	Exam_one	Exam_two	Admitted
0	34.623660	78.024693	0
1	30.286711	43.894998	0
2	35.847409	72.902198	0
3	60.182599	86.308552	1
4	79.032736	75.344376	1

查看数据的shape

#查看数据的shape
data.shape

(100, 3)

绘制出被录取的人和未被录取的人的散点图

#绘制出被录取的人和未被录取的人的散点图
admitted = data[data['Admitted'] == 1]
not_admitted = data[data['Admitted'] == 0]
plt.figure(figsize=(10,6),dpi = 150)
plt.scatter(admitted['Exam_one'],admitted['Exam_two'],s = 30,c = 'b',label = 'Admitted')
plt.scatter(not_admitted['Exam_one'],not_admitted['Exam_two'],s = 30,marker='x',c = 'r',label = 'Not_Admitted')
plt.xlabel('Exam_one  Score')
plt.ylabel('Exam_two  Score')
plt.legend()

<matplotlib.legend.Legend at 0x1a809d74748>

[外链图片转存失败(img-EfQbjWa3-1566114947814)(output_6_1.png)]

开始建立模型The logistic regression

预测函数 $\begin{aligned}h_\theta(x) = g(\theta^Tx) =\frac{1}{1+e^{\theta^Tx}}\end{aligned}$

其中 $\begin{aligned}\theta_0x_0+\theta_1x_1+\theta_2x_2 + …+\theta_nx_n = \sum_{i = 0}^{i = n}\theta_ix_i =\theta^TX\end{aligned}$
因为原始数据只有两列，即影响因素只有两个所以我们的参数 $\theta$ 只有偏置项 $\theta_0$ , $x_1$ 的系数 $\theta_1$ , $x_2$ 的系数 $\theta_2$

目标：建立分类器（求解出三个参数 $\theta_0 \theta_1 \theta_2$ ）

设定阈值，根据阈值判断录取结果

要完成的模块

sigmoid : 映射到概率的函数
model : 返回预测结果值
loss : 根据参数计算损失
gradient : 计算每个参数的梯度方向
descent : 进行参数更新
accuracy: 计算精度

sigmoid函数

$\frac{1}{1+e^{-z}}$

#定义一个函数sigmoid
def sigmoid(x):
    return 1.0/(1+np.exp(-x))

预测函数
$\begin{aligned}h_\theta(x) = g(\theta^Tx) =\frac{1}{1+e^{\theta^Tx}}\end{aligned}$
$\begin{array}{ccc} \begin{pmatrix}\theta_{0} & \theta_{1} & \theta_{2}\end{pmatrix} & \times & \begin{pmatrix}1\\ x_{1}\\ x_{2} \end{pmatrix}\end{array}=\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}$
定义函数predict返回函数预测录取的概率

#定义函数predict返回函数预测录取的概率
def model(X,theta):
    return sigmoid(np.dot(X,theta.T))

由于原始数据中还缺少一列值为1的列，所以下面添加一列全为1的值，并查看添加后的效果

data.insert(0,'Ones',1)
data.head()

	Ones	Exam_one	Exam_two	Admitted
0	1	34.623660	78.024693	0
1	1	30.286711	43.894998	0
2	1	35.847409	72.902198	0
3	1	60.182599	86.308552	1
4	1	79.032736	75.344376	1

可以看到已经成功插入全为1的列

#然后将data的DataFrame类型转化为Matrix类型便于后面的计算,同时设置X，y
ori_data = data.values
X = ori_data[:,:3]
y = ori_data[:,3:4]
X[:5],y[0:5]

(array([[ 1.        , 34.62365962, 78.02469282],
        [ 1.        , 30.28671077, 43.89499752],
        [ 1.        , 35.84740877, 72.90219803],
        [ 1.        , 60.18259939, 86.3085521 ],
        [ 1.        , 79.03273605, 75.34437644]]), array([[0.],
        [0.],
        [0.],
        [1.],
        [1.]]))

#theta的值初始化为一个（3,1）的向量,用0来占位
theta = np.zeros([1,3])
theta

array([[0., 0., 0.]])

损失函数

目标是使得似然函数的值尽可能的大，那么这就是一个梯度上升的问题，但是通常处理的是梯度下降问题，所以对对数似然函数去负号，这就转化成了梯度下降的问题。

将对数似然函数取负号
$D(h_\theta(x), y) = -y\log(h_\theta(x)) - (1-y)\log(1-h_\theta(x))$
求平均损失
$J(\theta)=\frac{1}{n}\sum_{i=1}^{n} D(h_\theta(x_i), y_i)$

X.shape,y.shape,theta.shape

((100, 3), (100, 1), (1, 3))

定义损失函数

def loss(X,y,theta):
    left = np.multiply(-y,np.log(model(X,theta)))
    right = np.multiply(1-y,np.log(1-model(X,theta)))
    return np.sum(left - right)/100

计算初始损失

loss(X,y,theta)

0.6931471805599453

可以看到目前的损失值是0.6931471805599453，接下来试图降低这个值

计算梯度

$\frac{\partial J}{\partial \theta_j}=-\frac{1}{m}\sum_{i=1}^n (y_i - h_\theta (x_i))x_{ij}$

定义函数计算梯度

#定义函数计算梯度
def gradient(X, y, theta):
    grad = np.zeros([1,3])
    error = (model(X, theta)- y)
    for j in range(3): #for each parmeter
        term = np.multiply(error, X[:,j])
        grad[0, j] = np.sum(term) / len(X)
    return grad

计算初始梯度

#初始梯度
gradient(X,y,theta)

array([[ -10.        , -656.44274057, -662.21998088]])

然后开始梯度下降计算

Gradiant Descent

在计算梯度下降之前，需要对整个数据重新洗牌，重新随机排列。防止人为的采集记录数据的过程中的某些因素的对数据的干扰，例如可能前十个都是女生的数据，后十个都是男生的数据等

#对数据进行洗牌  重新随机排列
def shuffleData(data):
    np.random.shuffle(data)
    X = data[:,:3]
    y = data[:,3:4]
    return X, y

data.values[0:5]

array([[ 1.        , 34.62365962, 78.02469282,  0.        ],
       [ 1.        , 30.28671077, 43.89499752,  0.        ],
       [ 1.        , 35.84740877, 72.90219803,  0.        ],
       [ 1.        , 60.18259939, 86.3085521 ,  1.        ],
       [ 1.        , 79.03273605, 75.34437644,  1.        ]])

X,y = shuffleData(ori_data)
X[0:5],y[0:5]

(array([[ 1.        , 45.08327748, 56.31637178],
        [ 1.        , 52.04540477, 69.43286012],
        [ 1.        , 68.46852179, 85.5943071 ],
        [ 1.        , 99.31500881, 68.77540947],
        [ 1.        , 51.04775177, 45.82270146]]), array([[0.],
        [1.],
        [1.],
        [1.],
        [0.]]))

定义完成之后下面是主程序

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import time

#读取文件  加入标题
data = pd.read_csv('data\LogiReg_data.csv',header=None,names=['Exam1','Exam2','Admitted'])
#绘制散点图
plt.figure(figsize=(10,6),dpi = 100)
plt.scatter(data[data['Admitted'] == 1]['Exam1'],data[data['Admitted'] == 1]['Exam2'],s = 30,c = 'b',label = 'Admitted')
plt.scatter(data[data['Admitted'] == 0]['Exam1'],data[data['Admitted'] == 0]['Exam2'],s = 30,c = 'r',label = 'Not Admitted')
plt.xlabel('Exam1 Score')
plt.ylabel('Exam2 Score')
plt.legend()
plt.show()
#给数据插入一列
data.insert(0,'Ones',1)
# print(data)
#定义sigmoid函数
def sigmoid(z):
    return 1.0/(1+np.exp(-z))
#定义预测模型
def model(X,theta):
    return  sigmoid(np.dot(X,theta.T))
#定义损失函数
def loss(X,y,theta):
    left = -np.multiply(y,np.log(model(X,theta)))
    right = np.multiply(1-y,np.log(1-model(X,theta)))
    return np.sum(left - right)/len(X)
#计算梯度
def gradient(X,y,theta):
    grad = np.zeros(theta.shape)
    error = (model(X, theta) - y).ravel()
    for j in range(3):
        term = np.multiply(error, X[:, j])
        grad[0, j] = np.sum(term) / 100
    return grad
#定义洗牌函数，即重新随机排列
def shuffleData(data):
    np.random.shuffle(data)
    X = data[:,:3]
    y = data[:,3:4]
    return X,y
#定义三种停止方式
#第一:根据设定的迭代次数决定是否停止计算
#第二：根据设定的梯度值的变化来决定，当梯度值的变化很小时候停止计算
#第三：根据损失值的变化来决定，当损失值的变化很小的时候停止计算
#需要用到的参数，Stop_Type:停止类型，threshhold:阈值  value与阈值比较判断是否迭代终止
STOP_ITER = 0
STOP_COST = 1
STOP_GRAD = 2
def stopCriterion(type, value, threshold):
    #设定三种不同的停止策略
    if type == STOP_ITER:        return value > threshold
    elif type == STOP_COST:      return abs(value[-1]-value[-2]) < threshold
    elif type == STOP_GRAD:      return np.linalg.norm(value) < threshold
    # 向量的L2范数（欧几里得范数，常用计算向量长度），即向量元素绝对值的平方和再开方，表示x到零点的欧式距离
#赋值给X，y
ori_data = data.values
X = ori_data[:,:3]
y = ori_data[:,3:4]
#生成theta
theta = np.zeros([1,3])
# print(X,y)
#计算当前损失值
print(loss(X,y,theta))
#计算当前梯度
# print(gradient(X,y,theta))
#计算梯度下降
#参数 data:数据  theta参数  Stop_Type 停止类型
# learning_rate 学习率 threshhold阈值  batchsize  根据batchsize的值选择批量、随机、小批量梯度下降
n = 100
def descent(data, theta, batchSize, stopType, thresh, learning_rate):
    # 梯度下降求解
    init_time = time.time()
    i = 0  # 迭代次数
    k = 0  # batch
    X, y = shuffleData(data)
    grad = np.zeros(theta.shape)  # 计算的梯度
    costs = [loss(X, y, theta)]  # 损失值
    while True:
        grad = gradient(X[k:k + batchSize], y[k:k + batchSize], theta)
        k += batchSize  # 取batch数量个数据
        if k >= n:
            k = 0
            X, y = shuffleData(data)  # 重新洗牌
        theta = theta - alpha * grad  # 参数更新
        costs.append(loss(X, y, theta))  # 计算新的损失
        i += 1
        if stopType == STOP_ITER:
            value = i
        elif stopType == STOP_COST:
            value = costs
        elif stopType == STOP_GRAD:
            value = grad
        if stopCriterion(stopType, value, thresh): break
    print(costs[-1])
    return theta, i - 1, costs, grad, time.time() - init_time
theta,iterations,costs,grad,dur = descent(ori_data,theta,100,2,0.0005,0.001)
#设定阈值
def predict(X, theta):
    return [1 if x >= 0.50 else 0 for x in model(X, theta)]
scaled_X = ori_data[:, :3]
y = ori_data[:, 3]
predictions = predict(scaled_X, theta)
correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, y)]
accuracy = (sum(map(int, correct)) % len(correct))
print ('accuracy = {0}%'.format(accuracy))
plt.plot(np.arange(len(costs)),costs)
plt.title('Iterations:{},learning_rate:{},accuracy:{}'.format(iterations,0.001,accuracy))
plt.show()