机器学习之逻辑回归（四）日考训练

最新推荐文章于 2024-03-04 20:02:05 发布

繁华三千东流水

最新推荐文章于 2024-03-04 20:02:05 发布

阅读量661

点赞数 1

分类专栏：机器学习项目练习文章标签：机器学习逻辑回归正则化

本文链接：https://blog.csdn.net/qq872890060/article/details/96101508

版权

机器学习项目练习专栏收录该内容

27 篇文章 6 订阅

订阅专栏

题目要求

1.按要求完成下面的各项需求。
利用python编写如下程序，
现有一个西红柿分类回归样本训练集（sample.txt文件）和测试集（test.txt文件），其中x1、x2、x3为苹果的检验参数（x1为含水量，X2为大小，X3重量），Y为分类结果（1为好果，0为坏果）。

2.请通过Python实现逻辑回归模型，并用此模型预测测试集数据，具体要求如下：

完成数据集的读取
实现Sigmoid函数,并画出该函数
实现逻辑回归的代价函数，实现正则化逻辑回归
实现梯度下降函数，要求输出迭代过程中的代价函数值
通过梯度下降计算回归模型，用所得模型对测试集的数据进行预测，并计算准确率
使用X2，X3两组特征画出逻辑回归0-1分布图

题目分析

这个项目可以分为三部分来完成：
第一部分：数据读取和数据处理
第二部分：模型训练，加入正则化防止过拟合
第三部分：画图展示部分

代码如下：

import numpy as np
from matplotlib import pyplot as plt
# 设置中文字体和负号正确显示
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# 读取数据
data_train = np.loadtxt(r'sample.txt',delimiter=',')
data_test = np.loadtxt(r'test.txt',delimiter=',')


# 定义数据处理函数
def preprocess(data):
    # 数据提取
    X = data[:,:-1]
    y = data[:,-1]
    # 特征缩放
    X -= np.mean(X,axis=0)
    X /= np.std(X,axis=0,ddof=1)
    # 数据初始化
    X = np.c_[np.ones(len(X)),X]
    y = np.c_[y]
    # 数据处理完毕，返回处理好的数据
    return X,y


# 调用数据处理函数获得处理好的数据
X_train,y_train = preprocess(data_train)
X_test,y_test = preprocess(data_test)


# 定义sigmoid函数
def g(z):
    h = 1.0/(1+np.exp(-z))
    return h


# 定义逻辑回归模型
def model(X,theta):
    z = np.dot(X,theta)
    h = g(z)
    return h


# 定义代价函数
def costFunction(h,y,R):
    m = len(h)
    J0 = (-1/(2 * m)) * np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))
    J = J0 + R
    return J


# 定义梯度下降
def gradeDesc(X,y,alpha=0.01,iter_num=20000,lamda=0.0):
    # 数据准备
    # 获取数据维度
    m,n = X.shape
    # 初始化theta
    theta = np.zeros((n,1))
    # 初始化代价记录表
    J_history = np.zeros(iter_num)
    # 执行梯度下降
    for i in range(iter_num):
        h = model(X,theta)
        # 定义正则化
        theta_r = theta.copy()
        theta_r[0] = 0
        R = (lamda/(2 * m)) * np.sum(np.square(theta_r))
        J_history[i] = costFunction(h,y,R)
        # 求取delta
        deltaTheta = (1/m) * (np.dot(X.T,h-y) + lamda * theta_r)
        # 更新theta
        theta -= deltaTheta
    # 模型训练完毕 返回训练好的参数和记录表
    return J_history,theta


# 调用梯度现将函数获得训练好的theta
J_history,theta = gradeDesc(X_train,y_train,alpha=0.01,iter_num=10000,lamda=15)
# 将训练好的theta和测试集数据传入模型，获得测试集的预测值
y_test_h = model(X_test,theta)


# 定义求准确率函数
def score(h,y):
    l = len(h)
    count = 0
    for i in range(l):
        if np.where(h[i]>=0.5,1,0) == y[i]:
            count +=1
    # 计算完毕，返回准确率
    return count/l


# 调用准确率函数获得本次预测的准确率
print('准确率是:',score(y_test_h,y_test))


# 画图部分

# 画sigmoid函数图
x = np.linspace(-10,10,500)
y = g(x)
plt.title('sigmoid曲线图')
plt.plot(x,y)
plt.plot(0,g(0),marker='x')
plt.show()

# 画代价函数图
plt.title('代价函数图')
plt.plot(J_history)
plt.show()

# 画01分布图
plt.title('真实值和预测值对比图')
# 先画真实值
plt.scatter(X_test[y_test[:,0] == 0,2],X_test[y_test[:,0] == 0,3],label='真实值负类')
plt.scatter(X_test[y_test[:,0] == 1,2],X_test[y_test[:,0] == 1,3],label='真实值正类')
# 再画预测值
plt.scatter(X_test[y_test_h[:,0] <= 0.5,2],X_test[y_test_h[:,0] <= 0.5,3],label='预测值负类')
plt.scatter(X_test[y_test_h[:,0] > 0.5,2],X_test[y_test_h[:,0] > 0.5,3],label='预测值正类')
plt.legend()
plt.show()