机器学习-实验一

最新推荐文章于 2024-08-21 09:04:49 发布

Qutter

最新推荐文章于 2024-08-21 09:04:49 发布

阅读量859

点赞数

文章标签：机器学习逻辑回归人工智能

本文链接：https://blog.csdn.net/qq_44458671/article/details/121237834

版权

该博客深入探讨了逻辑回归算法，包括其在二分类和多分类问题中的应用。通过softmax函数处理多分类，利用极大似然估计求解参数，并介绍了L2正则化的使用。实验中，博主使用鸢尾花数据集，通过sklearn库的LogisticRegression进行模型训练，并展示了ROC曲线，评估模型性能。

摘要由CSDN通过智能技术生成

实验一逻辑回归

一、实验目的

加深对逻辑回归算法的理解和认识。
掌握基于逻辑回归的二分类算法和基于 softmax 的多分类算法的设计方法。

二、实验原理

先拟合决策边界(不局限于线性，还可以是多项式)，再建立这个边界与分类的概率联系，从而得到了二分类情况下的概率。
极大似然估计求解的思想和理论依据。
逻辑回归的评价指标。

三、聚类步骤

读入要分类的数据(数据集：iris_data)，并做一些数据格式的预处理，划分训练集和测试集。
选择对鸢尾花实现多分类，可使用 softmax 实现;
$Softmax(x_i) = \frac{e^{\theta_k x_i}}{\sum^k_{j = 1}e^{\theta_kx_i}}$
其中 $x_i$ 是第 $i$ 个节点的输出值。通过 $S o f t m a x$ 函数即可将多分类的输出值改写为范围在 $[0, 1]$ 内和为 $1$ 的概率分布。
目标函数加上 $L_2$ 正则项。
利用极大似然估计求解关于未知参数 $\theta$ 的梯度。
利用梯度下降公式，逐步求解，直至目标函数收敛或者迭代到预设定的运行步数。
查阅分类正确与否的指标 AUC ,并画出对应的结果图。

AUC 即 Roc 曲线与坐标轴形成的面积，取值范围 [0, 1]。

ROC 将 FPR 定义为 x 轴， TPR 定义为 y 轴;

TPR 即伪阳性率，表示在所有实际为阳性的样本中，被正确地判断为阳性之比率， $\frac{TP}P = \frac {TP}{(TP+FN)}$ ;

FPR 即真阳性率，表示在所有实际为阴性的样本中，被错误地判定为阳性之比率,

$\frac{FP}N = \frac{FP}{FP+TN}$

四、代码和执行结果展示

实验代码如下

import re
from itertools import cycle
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import torch
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import LabelEncoder, StandardScaler

def pre_process(data_=None):
   with open("./iris_data.txt", "r") as f:
       count = 0
       for line in f:
           res = list(re.findall(r'-?\d+\.?\d*e?-?\d*?', line))
           count = count + 1
           if count > 1:
               data_.append(res[1])
               data_.append(res[2])
               data_.append(res[3])
               data_.append(res[4])

       data_ = np.array(data_)
       data = data_.reshape(150, 4)  # data
       print(data)

s0 = []
r0 = []
s1 = []
r1 = []
s2 = []
r2 = []

# 数据集已经在 sklearn 包当中，也可以使用 preprocess 解析所给的 Iris.txt 文件进行
def softmax():
   iris_data = load_iris()
   x = iris_data['data']
   y = iris_data['target']
   x = StandardScaler().fit_transform(x)  # 数据标准化
   y = LabelEncoder().fit_transform(y)  # 文本编码，便于处理
   x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=66, test_size=0.3)
   LR = LogisticRegression()
   ls = torch.cuda.is_available()
   model = GridSearchCV(LR, param_grid={'C': [1, 10, 20, 50]})
   model.fit(x_train, y_train)
   print(model.score(x_test, y_test))
   y_predict = model.predict(x_test)

   y_test = list(y_test)
   y_predict = list(y_predict)
   it1 = next(iter(y_test))
   it2 = next(iter(y_predict))
   # print(next(it1))

   for i in range(0, len(y_test)):
       if it1 == 0:
           s0.append(it1)
           r0.append(it2)
       elif it1 == 1:
           s1.append(it1)
           r1.append(it2)
       elif it1 == 2:
           s2.append(it1)
           r2.append(it2)
   colors = cycle(['blue', 'orange', 'red'])

   for i, color in zip(range(n_classes), colors):
       plt.plot(fpr[i], tpr[i], color=color, lw=lw,
                label='ROC curve of class {0} (area = {1:0.2f})'
                      ''.format(i, roc_auc[i]))
   plt.title('Results:The higher the better')
   plt.show()

def append_to_res0(i):
   r0.append(i)
   
def append_to_res1(i):
   r1.append(i)

def append_to_res2(i):
   r2.append(i)

if __name__ == '__main__':
   softmax()

绘制 ROC 曲线

即输出的 $R O C$ 曲线如下

Qutter

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
机器学习-实验一

实验一逻辑回归一、实验目的加深对逻辑回归算法的理解和认识。掌握基于逻辑回归的二分类算法和基于 softmax 的多分类算法的设计方法。二、实验原理先拟合决策边界(不局限于线性，还可以是多项式)，再建立这个边界与分类的概率联系，从而得到了二分类情况下的概率。极大似然估计求解的思想和理论依据。逻辑回归的评价指标。三、聚类步骤读入要分类的数据(数据集：iris_data)，并做一些数据格式的预处理，划分训练集和测试集。选择对鸢尾花实现多分类，可使用 sof
复制链接

扫一扫