线性回归实战分析汽车油耗效率

最新推荐文章于 2023-05-23 13:14:36 发布

上课不要摸鱼江

最新推荐文章于 2023-05-23 13:14:36 发布

阅读量1.1k

点赞数 3

分类专栏：机器学习文章标签： python 机器学习可视化数据分析

本文链接：https://blog.csdn.net/qq_43653405/article/details/107632107

版权

机器学习专栏收录该内容

23 篇文章 0 订阅

订阅专栏

记录一下利用线性回归的机器学习实战，主题是对汽车油耗效率的分析，比较简单。是在anaconda的jupyter中做的，目的在于加深对过程的理解。

模型分析

import pandas as pd
import matplotlib.pyplot as plt
#因为原数据集中没有指标，我需要先指定指标
columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]
cars = pd.read_table("auto-mpg.data", delim_whitespace=True, names=columns) #读数据
print(cars.head(5))

在这里插入图片描述

#拿到数据后先进行一下数据可视化，看有什么规律
fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)
cars.plot("weight", "mpg", kind='scatter', ax=ax1) #x：weight， y：mpg， 指定散点图
cars.plot("acceleration", "mpg", kind='scatter', ax=ax2)
plt.show()

在这里插入图片描述

import sklearn
from sklearn.linear_model import LinearRegression #导库
lr = LinearRegression(fit_intercept=True) #拿到模型
lr.fit(cars[["weight"]], cars["mpg"])  #fit模型训练数据， fit（input，output（label值））
predictions = lr.predict(cars[["weight"]]) #预测，参数是新数据
#对比一下预测和真实值
print(predictions[0:5])
print(cars["mpg"][0:5])

[19.41852276 17.96764345 19.94053224 19.96356207 19.84073631]
0 18.0
1 15.0
2 18.0
3 16.0
4 17.0
Name: mpg, dtype: float64

#可视化比较更直观
plt.scatter(cars["weight"], cars["mpg"], c='red')  #真实值
plt.scatter(cars["weight"], predictions, c='blue') #预测值
plt.show()

在这里插入图片描述

模型评估

得到模型还不够，需要对模型进行评估，此处用指标MSE均方误差。

#衡量一下模型的好坏，指标是MSE均方误差
lr = LinearRegression()
lr.fit(cars[["weight"]], cars["mpg"])
predictions = lr.predict(cars[["weight"]])
from sklearn.metrics import mean_squared_error #导入均方误差模块
mse = mean_squared_error(cars["mpg"], predictions) #参数是预测值和真实标签值
print(mse)

18.780939734628397

#一般用根号下MSE表示，更准确
mse = mean_squared_error(cars["mpg"], predictions)
rmse = mse ** (0.5)
print (rmse)

4.333698159150957

上课不要摸鱼江

关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
线性回归实战分析汽车油耗效率

记录一下利用线性回归的机器学习实战，主题是对汽车油耗效率的分析，比较简单。是在anaconda的jupyter中做的，目的在于加深对过程的理解。import pandas as pdimport matplotlib.pyplot as plt#因为原数据集中没有指标，我需要先指定指标columns = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin"
复制链接

扫一扫