Python数据建模--回归

最新推荐文章于 2024-08-05 17:42:48 发布

小天资源

最新推荐文章于 2024-08-05 17:42:48 发布

阅读量1.8k

点赞数 3

分类专栏： Python 数据分析数据建模

本文链接：https://blog.csdn.net/qq_42169061/article/details/106135094

版权

Python 同时被 3 个专栏收录

53 篇文章 13 订阅

订阅专栏

数据分析

44 篇文章 4 订阅

订阅专栏

数据建模

8 篇文章 3 订阅

订阅专栏

这里写目录标题线性回归的python实现方法

介绍：线性回归通常是人们在学习预测模型时首选的技术之一。在这种技术中，因变量是连续的，自变量可以是连续的也可以是离散的，回归线的性质是线性的。线性回归使用最佳的拟合直线（也就是回归线）在因变量（Y）和一个或多个自变量（X）之间建立一种关系

分类：简单线性回归 / 多元线性回归

简单线性回归

导入库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

生成数据并绘制成散点图

rng = np.random.RandomState(1)  
xtrain = 10 * rng.rand(30)
ytrain = 8 + 4 * xtrain + rng.rand(30)
# np.random.RandomState → 随机数种子，对于一个随机数发生器，只要该种子（seed）相同，产生的随机数序列就是相同的
# 生成随机数据x与y
# 样本关系：y = 8 + 4*x

fig = plt.figure(figsize =(12,3))
ax1 = fig.add_subplot(1,2,1)
plt.scatter(xtrain,ytrain,marker = '.',color = 'k')
plt.grid()
plt.title('样本数据散点图')
# 生成散点图

训练模型


model = LinearRegression()
model.fit(xtrain[:,np.newaxis],ytrain)
# LinearRegression → 线性回归评估器，用于拟合数据得到拟合直线
# model.fit(x,y) → 拟合直线，参数分别为x与y
# x[:,np.newaxis] → 将数组变成(n,1)形状

生成测试数据集

xtest = np.linspace(0,10,1000)
ytest = model.predict(xtest[:,np.newaxis])
# 创建测试数据xtest，并根据拟合曲线求出ytest
# model.predict → 预测

绘制拟合直线

ax2 = fig.add_subplot(1,2,2)
plt.scatter(xtrain,ytrain,marker = '.',color = 'k')
plt.plot(xtest,ytest,color = 'r')
plt.grid()
plt.title('线性回归拟合')
# 绘制散点图、线性回归拟合直线

打印直线参数和直线方程

print('斜率a为：%.4f' % model.coef_[0])
print('截距b为：%.4f' % model.intercept_)
print('线性回归函数为：\ny = %.4fx + %.4f' %(model.coef_[0],model.intercept_))

误差分析

创建样本数据并进行拟合

rng = np.random.RandomState(8)
xtrain = 10 * rng.rand(15)
ytrain = 8 + 4 * xtrain + rng.rand(15) * 30
model.fit(xtrain[:,np.newaxis],ytrain)
xtest = np.linspace(0,10,1000)
ytest = model.predict(xtest[:,np.newaxis])

绘制误差线

plt.plot(xtest,ytest,color = 'r',linestyle = '--')  # 拟合直线
plt.scatter(xtrain,ytrain,marker = '.',color = 'k')  # 样本数据散点图
ytest2 = model.predict(xtrain[:,np.newaxis])  # 样本数据x在拟合直线上的y值
plt.scatter(xtrain,ytest2,marker = 'x',color = 'g')   # ytest2散点图
plt.plot([xtrain,xtrain],[ytrain,ytest2],color = 'gray')  # 误差线
plt.grid()
plt.title('误差')

多元线性回归

创建数据

rng = np.random.RandomState(5)  
xtrain = 10 * rng.rand(150,4)
ytrain = 20 + np.dot(xtrain ,[1.5,2,-4,3])
df = pd.DataFrame(xtrain, columns = ['b1','b2','b3','b4'])
df['y'] = ytrain
pd.plotting.scatter_matrix(df[['b1','b2','b3','b4']],figsize=(10,6),
                 diagonal='kde',
                 alpha = 0.5,
                 range_padding=0.1)
print(df.head())

创建模型，拟合数据

model = LinearRegression()
model.fit(df[['b1','b2','b3','b4']],df['y'])
# 多元回归拟合

print('斜率a为：' ,model.coef_)
print('截距b为：%.4f' % model.intercept_)
print('线性回归函数为：\ny = %.1fx1 + %.1fx2 + %.1fx3 + %.1fx4 + %.1f' 
      % (model.coef_[0],model.coef_[1],model.coef_[2],model.coef_[3],model.intercept_))

线性回归模型评估

通过几个参数验证回归模型：

SSE(和方差、误差平方和)：The sum of squares due to error
MSE(均方差、方差)：Mean squared error
RMSE(均方根、标准差)：Root mean squared error
R-square(确定系数)：Coefficient of determination

创建数据

from sklearn import metrics

rng = np.random.RandomState(1)  
xtrain = 10 * rng.rand(30)
ytrain = 8 + 4 * xtrain + rng.rand(30) * 3

多元回归拟合


model = LinearRegression()
model.fit(xtrain[:,np.newaxis],ytrain)

计算均方根、均方差和确定系数

ytest = model.predict(xtrain[:,np.newaxis])  # 求出预测数据
mse = metrics.mean_squared_error(ytrain,ytest)  # 求出均方差
rmse = np.sqrt(mse)  # 求出均方根


ssr = ((ytest - ytrain.mean())**2).sum()  # 求出预测数据与原始数据均值之差的平方和
print("ssr", ssr)
sst = ((ytrain - ytrain.mean())**2).sum()  # 求出原始数据和均值之差的平方和
print("sst", sst)
r2 = ssr / sst # 求出确定系数
print("r2", r2)

r2 = model.score(xtrain[:,np.newaxis],ytrain)  # 求出确定系数
print("均方差MSE为: %.5f" % mse)
print("均方根RMSE为: %.5f" % rmse)
print("确定系数R-square为: %.5f" % r2)
# 确定系数R-square非常接近于1，线性回归模型拟合较好

Python 数据建模：

- Python数据建模–回归
 - Python数据建模–分类
 - Python数据建模–主成分分析
 - Python数据建模–K-means聚类
 - Python数据建模–蒙特卡洛模拟

小天资源

关注

3
点赞
踩
23

收藏

觉得还不错? 一键收藏
0
评论
Python数据建模--回归

这里写目录标题线性回归的python实现方法简单线性回归导入库生成数据并绘制成散点图训练模型生成测试数据集绘制拟合直线打印直线参数和直线方程误差分析创建样本数据并进行拟合绘制误差线多元线性回归创建数据创建模型，拟合数据线性回归模型评估创建数据多元回归拟合计算均方根、均方差和确定系数介绍：线性回归通常是人们在学习预测模型时首选的技术之一。在这种技术中，因变量是连续的，自变量可以是连续的也可以是离散的，回归线的性质是线性的。线性回归使用最佳的拟合直线（也就是回归线）在因变量（Y）和一个或多个自变量（X）之间建
复制链接

扫一扫

专栏目录