线性回归及模型的评估

最新推荐文章于 2024-05-09 09:07:05 发布

数分诶

最新推荐文章于 2024-05-09 09:07:05 发布

阅读量1.4k

点赞数 4

文章标签：线性代数 python

本文链接：https://blog.csdn.net/weixin_45496778/article/details/105619944

版权

线性回归模型

以鸢尾花的花瓣长度与宽度为例，实现回归

import numpy as np 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt 
#数据的导入与处理
iris = load_iris()
x = iris.data[:,2].reshape(-1,1)
y = iris.data[:,3].reshape(-1,1)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=np.random.seed(0))
#模型的拟合
lr = LinearRegression()
lr.fit(x_train,y_train)
#预测
y_hat = lr.predict(x_test)

检查模型的特征的权重与截距

print("权重：",lr.coef_)
print("截距：",lr.intercept_)

权重： [[0.41871246]]
截距： [-0.37545151]

绘制拟合的直线

%matplotlib inline
plt.rcParams["font.family"]="SimHei"
plt.rcParams["axes.unicode_minus"]=False
plt.rcParams["font.size"] = 16

fig,ax = plt.subplots(1,2)
fig.set_size_inches(16,7)
ax[0].scatter(x_train,y_train,s=15,c="b",label="训练集")
ax[0].plot(x_train,lr.predict(x_train),c="g",label="训练集")
ax[1].scatter(x_test,y_test,s=15,c="r",label="测试集")
ax[1].plot(x_test,lr.predict(x_test),c="g",label="训练集")
for axn in ax:
    axn.legend()
    axn.set_xlabel("花瓣长度")
    axn.set_ylabel("花瓣宽度")
plt.show()

在这里插入图片描述
下面来看一下预测值和真实值的差异

plt.figure(figsize=(17,6))
plt.plot(y_test,c="g",marker="o",label="预测值")
plt.plot(y_hat,c="r",marker="*",ls="--",label="预测值")
plt.legend()
plt.ylabel("数据值")
plt.show()

在这里插入图片描述

线性模型的评估

建立好了线性模型之后，最重要的是评估模型的好坏，主要采用一下几个方法来评估模型：

MSE(mean_squared_error)
RMSE(root_mean_squared_error)
MAE(mean_absolute_error)
$R^2$

MSE

MSE： mean_squared_error ，平均平方误差，即所有样本数据的真实值与预测值之差的平方和的平均值。
在这里插入图片描述

RMSE

RMSE：root_mean_squared_error，即MSE的平方根。
在这里插入图片描述

MAE

MAE：mean_absolute_error，平均绝对值误差，即所有样本数据的误差的绝对值之和。
在这里插入图片描述

$R^2$

$R^2$ 为决定系数，用来表示模型拟合性的分值，值越高表示模型拟合性越好，在训练集中， $R^2$ 的取值范围是[ 0 , 1 ]，在测试集中（未知数据）， $R^2$ 的取值范围是[ $-\infty$ , 1 ]。其中公式为
在这里插入图片描述
其中TSS（total sum of squares）为所有样本数据与均值的差异，是方差的m倍数 (TSS/m 为方差），RSS（residual sum of squares）为所有样本数据误差平方和，是MSE的m倍（RSS / m 为MSE）。
又公式可以看出，当预测值与真实值相同的时候，RSS = 0 ， $R^2$ =1，模型最理想。

python的实现方式：

"""回归模型评价 """
print("均方误差MSE(mean_squared_error):",mean_squared_error(y_test,y_hat))
print("跟均方误差RMSE(root mean_squared_error):",np.sqrt(mean_squared_error(y_test,y_hat)))
print("平均绝对值误差MAE(mean_absolute_error):",mean_absolute_error(y_test,y_hat))
print("训练集R^2:",r2_score(y_train,lr.predict(x_train)))
print("测试集R^2:",r2_score(y_test,y_hat))
print("训练集R^2:",lr.score(x_train,y_train))
print("测试集R^2:",lr.score(x_test,y_test))

特别注意：r2_score 与 lr.score，两种方法均为 $R^2$ 的值，但是两个传入的参数不一样
在这里插入图片描述

均方误差MSE(mean_squared_error): 0.05335352448031869
跟均方误差RMSE(root mean_squared_error): 0.23098381865472456
平均绝对值误差MAE(mean_absolute_error): 0.1677835275546856
训练集R^2: 0.9381656942757268
测试集R^2: 0.8956126694950287
训练集R^2: 0.9381656942757268
测试集R^2: 0.8956126694950287

数分诶

关注

4
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
线性回归及模型的评估

线性回归模型以鸢尾花的花瓣长度与宽度为例，实现回归import numpy as np from sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_absolute_error,me...
复制链接

扫一扫