机器学习实验之多元线性分析

最新推荐文章于 2024-06-11 15:38:43 发布

Chimpanzee1

最新推荐文章于 2024-06-11 15:38:43 发布

阅读量340

点赞数

分类专栏： Python知识体系人工智能知识体系文章标签：机器学习 python 人工智能数据分析

本文链接：https://blog.csdn.net/playboygogogo/article/details/109717791

版权

人工智能知识体系同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

Python知识体系

5 篇文章 0 订阅

订阅专栏

一、实验目的：
掌握最小二乘法求解代价函数优化、掌握梯度下降法，理解过拟合、克服过拟合的方法。
二、实验要求及实验环境
对SOH5给出的数据利用高阶多项式函数拟合曲线；并将多项式拟合转化为多元线性回归求解；用你得到的实验数据，解释过拟合。
用不同数据量，不同超参数，不同的多项式阶数，比较实验效果。
语言不限，可以用matlab，python。

import scipy.io as sio
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei'] # 设置字体样式
plt.rcParams['axes.unicode_minus']=False # 设置字符不显示
#load
data = sio.loadmat('C:/Users/Administrator/Documents/Tencent Files/1506698498/FileRecv/SOH5.mat')

# 对mat文件的简单处理
y = data['soh']
x = np.arange(1, 169).reshape(-1, 1) # 自变量
y = y.reshape(-1, 1) # 因变量
# 画出数据的散点分布图
plt.scatter(x, y)
plt.title('数据集散点分布图')
plt.show()

在这里插入图片描述

# 划分训练集和测试集
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=666)

## sklearn的pippeline的封装
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression,Ridge

## 网格搜索
from sklearn.model_selection import GridSearchCV
# 评估标准
from sklearn.metrics import r2_score

#实例化Pipeline
pipe_reg = Pipeline([
    ('poly',PolynomialFeatures()),
    ('lin_reg',LinearRegression())
])
# 设置 pipeline 参数
degree = {
    'poly__degree':np.arange(0, 9)
}

#实例化gridsearch 
grid = GridSearchCV(pipe_reg, param_grid=degree) 
# 训练 gridsearch 
grid.fit(X_train, y_train)

#查看结果
print ('最佳得分：',grid.best_score_,'最优模型参数：', grid.best_params_)

test_predict = grid.predict(X_test)
print('R2_score:',r2_score(y_test, test_predict),'测试得分:',grid.score(X_test, y_test))
# 画出拟合曲线
y_predict = grid.predict(x)
plt.scatter(x, y_predict)
plt.title('拟合曲线')
plt.show()

在这里插入图片描述

# 拟合曲线与数据集的分布对比
plt.scatter(x,y)
plt.scatter(x, y_predict)
plt.title('拟合曲线与数据集的分布对比')
plt.show()

在这里插入图片描述

# 用不同数据量，不同超参数，不同的多项式阶数，比较实验效果。
def func(degree, test_size):
    m_train, m_test, n_train, n_test = train_test_split(x, y, test_size=test_size, random_state=666)
    #实例化Pipeline
    pipe_reg = Pipeline([
        ('poly',PolynomialFeatures(degree=degree)),
        ('lin_reg',LinearRegression())
    ])
    # 设置 pipeline 参数
    param_grid = {'poly__degree':[degree]}
    #实例化gridsearch 
    grid = GridSearchCV(pipe_reg, param_grid=param_grid) 
    # 训练 gridsearch 
    grid.fit(m_train, n_train)

    #查看结果
    print ('最佳得分：',grid.best_score_,'最优模型参数：', grid.best_params_)

    test_predict = grid.predict(X_test)
    print('R2_score:',r2_score(y_test, test_predict),'测试得分:',grid.score(X_test, y_test))
    # 拟合曲线与数据集的分布对比
    plt.scatter(x,y)
    plt.scatter(x, grid.predict(x))
    plt.title('degree:{},test_size:{}  拟合曲线与数据集的分布对比'.format(degree,test_size))
    plt.show()
    test_predict = grid.predict(m_test)
    return r2_score(n_test, test_predict),grid.score(m_test, n_test)
func(7, 0.3)

在这里插入图片描述

func(3, 0.2)

在这里插入图片描述
本次实验利用作为模型评估标准。，即决定系数，反映因变量的全部变异能通过回归关系被自变量解释的比例。模型越好：r2→1，模型越差：r2→0。
通过图一，我们可以详细看到数据集的散点分布。
利用pipeline管道，对生成多项式和交互特征的PolynomialFeatures和线性回归LinearRegression进行封装；再使用网格函数GridSearchCV对多项式的阶数进行网格搜索，可以得到图二，其中网格搜索的最佳得分为0.994245516830313，最优模型参数为阶数degree=7，R2_score值为0.9935388482684945，测试集得分为0.9935388482684945。
通过图三我们可以观察到七阶多项式函数可以拟合绝大部分数据。
我们通过改变阶数与数据量，来进行比较。
在图四中，阶数degree=7，训练集和测试集的划分比例是0.3，其中训练得分为0.993487731203949，R2_score为0.9928531372916021，测试得分为0.9928531372916021。
在图五中，阶数degree=3，训练集和测试集的划分比例是0.2，其中训练得分为0.9920175901046212，R2_score为0.987927653599221，测试得分为0.987927653599221。
通过图二、图四、图五的比较，可以再次验证实验结果的正确性。

Chimpanzee1

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习实验之多元线性分析

一、实验目的：掌握最小二乘法求解代价函数优化、掌握梯度下降法，理解过拟合、克服过拟合的方法。二、实验要求及实验环境对SOH5给出的数据利用高阶多项式函数拟合曲线；并将多项式拟合转化为多元线性回归求解；用你得到的实验数据，解释过拟合。用不同数据量，不同超参数，不同的多项式阶数，比较实验效果。语言不限，可以用matlab，python。import scipy.io as sioimport numpy as npimport matplotlib.pyplot as pltplt.rc
复制链接

扫一扫

专栏目录