题目:
一组Advertising的实验数据,数据包含了200个不同市场的产品销售额,每个销售额对应3中广告媒体投入成本,分别为TV、radio、newspaper。主要属性如下:(1)Number:数据集的编号;(2)TV:电视媒体的广告投入;(3)radio:广播媒体的广告投入;(4)newspaper:报纸媒体的广告投入;(5)sales:商品的销量。
请建立多元回归模型分析电视媒体、广播媒体以及报纸媒体的广告投入与产品销售额之间的模型,并对模型进行预测、评估及优化。
# -*- coding: utf-8 -*- #
"""
@Project :MachineLearning_exp
@File :MultipleRegression_adver.py
@Author :ZAY
@Time :2023/3/16 11:18
@Annotation : " "
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression # 引用线性回归
from sklearn.model_selection import train_test_split # 对数据进行测试集与训练集的界分,有助于我们评价模型
from sklearn.metrics import mean_squared_error # 引用计算均平方差
data = pd.read_csv(".//Data//Advertising.csv", index_col = 0)
x = data[['TV','radio','newspaper']]
y = data["sales"]
plt.scatter(data["TV"],data["sales"],color='magenta')
plt.xlabel('TV')
plt.ylabel('sales')
plt.title('TV-sales_scatter')
plt.show()
plt.scatter(data["radio"],data["sales"],color='magenta')
plt.xlabel('radio')
plt.ylabel('sales')
plt.title('radio-sales_scatter')
plt.show()
plt.scatter(data["newspaper"],data["sales"],color='magenta')
plt.xlabel('newspaper')
plt.ylabel('sales')
plt.title('newspaper-sales_scatter')
plt.show()
x_train,x_test,y_train,y_test = train_test_split(x,y)
print("训练样本数量:" + str(len(x_train)) + " 测试样本数量:" + str(len(x_test)))
# 初始化模型
model = LinearRegression()
# 训练模型
model.fit(x_train,y_train)
print("斜率:" + str(model.coef_) + "截距:" + str(model.intercept_))
# 预测并计算均方差
print(mean_squared_error(model.predict(x_test),y_test))
print("多元线性回归公式:y=" + str(round(model.coef_[0],3)) + "TV+" + str(round(model.coef_[1],3)) + "radio+" + str(round(model.coef_[2],3)) + "newspaper+" + str(round(model.intercept_,3)))
进一步优化:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression # 引用线性回归
from sklearn.model_selection import train_test_split # 对数据进行测试集与训练集的界分,有助于我们评价模型
from sklearn.metrics import mean_squared_error # 引用计算均平方差
# 进一步优化 去除newspaper影响较小的自变量
x = data[['TV','radio']]
y = data["sales"]
x_train,x_test,y_train,y_test = train_test_split(x,y)
print("训练样本数量:" + str(len(x_train)) + " 测试样本数量:" + str(len(x_test)))
# 初始化模型
model = LinearRegression()
# 训练模型
model.fit(x_train,y_train)
print("斜率:" + str(model.coef_) + "截距:" + str(model.intercept_))
# 预测并计算均方差
print(mean_squared_error(model.predict(x_test),y_test))
print("多元线性回归公式:y=" + str(round(model.coef_[0],3)) + "TV+" + str(round(model.coef_[1],3)) + "radio+" + str(round(model.intercept_,3)))
散点图如下:
实验结果:
训练样本数量:150 测试样本数量:50
斜率:[0.04593059 0.18103463 0.00396352]截距:2.976290441694104
2.268518487412258
多元线性回归公式:y=0.046TV+0.181radio+0.004newspaper+2.976
训练样本数量:150 测试样本数量:50
斜率:[0.04286292 0.19087586]截距:3.337527710795799
4.860875731732358
多元线性回归公式:y=0.043TV+0.191radio+3.338
实验数据请私信,尽力提供!