多项式回归过拟合&欠拟合处理

qq_38404903

已于 2023-10-25 20:03:00 修改

阅读量229

点赞数 10

文章标签：回归数据挖掘人工智能

于 2023-10-25 19:47:01 首次发布

本文链接：https://blog.csdn.net/qq_38404903/article/details/134011526

版权

## 欠拟合：
	当训练模型进行预测时，发现时欠拟合需要进行以下处理：
		提高线性的次数（高次多项式）建立模拟拟合曲线
	但是次数过高会导致过拟合，次数不够会欠拟合
		y = w*x+b: 一次多项式函数
		y = w1*x^2+w2*x+b：二次多项式函数
		y = w1*x^3+w2*x^2+w3*x+b：三次多项式函数
		。。。
		y = w1*x^n+w2*x^(n-1)+······+wn*x+b：n次多项式函数
		w后面的数字是下标，代表不同的w不是真实的数字
		
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score as R2
import matplotlib.pyplot as plt
x_train = [[6], [8], [10], [14], [18]]
y_train = [[7], [9], [13], [17.5], [18]]

linner = LinearRegression()
# 得到模型的预测结果
linner.fit(x_train,y_train)
pre_price = linner.predict(x_train)
print(pre_price)
plt.scatter(x_train,y_train)
plt.plot(x_train,pre_price)
plt.show()
mse_1 = MSE(y_train,pre_price)
r2_1 = R2(y_train,pre_price)
# 在没有对原始样本进行任何操作的前提下，对模型评估结果：
print(mse_1, r2_1 )
# 建立2次多项式线性回归模型进行预测
# 需要使用工具 from sklearn.preprocessing import PolynomialFeatures
p2 = PolynomialFeatures(degree=2, include_bias=False)  # degree表示数据是几阶的，而include_bias一般设置为False
p2_x_train = p2.fit_transform(x_train)
# 不把include_bias设置为False 得到的p2_x_train是

# [[  1.   6.  36.]
#  [  1.   8.  64.]
#  [  1.  10. 100.]
#  [  1.  14. 196.]
#  [  1.  18. 324.]]
# 变成了二阶函数
# 如果设置为False这时候就变成了
# [[  6.  36.]
#  [  8.  64.]
#  [ 10. 100.]
#  [ 14. 196.]
#  [ 18. 324.]]
print(p2_x_train)
# 这时候特征变成了高阶的特征，可以进行二阶拟合
linner2 = LinearRegression()
linner2.fit(p2_x_train,y_train)
y_pred2 = linner2.predict(p2_x_train)
y_mes_2 = MSE(y_train,y_pred2)
y_r2_2 = R2(y_train,y_pred2)
plt.scatter(x_train,y_train)
plt.plot(x_train,y_pred2)
print(y_mes_2,y_r2_2)
plt.show()

# 建立3次多项式线性回归模型进行预测

p3 = PolynomialFeatures(degree=3,include_bias=False)
p3_train = p3.fit_transform(x_train)
print(p3_train)

linner3 = LinearRegression()
linner3.fit(p3_train,y_train)
y_pred3 = linner3.predict(p3_train)
plt.scatter(x_train,y_train)
plt.plot(x_train,y_pred3)
plt.show()
MSE3 = MSE(y_train,y_pred3)
R2_3=R2(y_train,y_pred3)
print(MSE3,R2_3)

输出就会

[[ 7.82327586]
 [ 9.77586207]
 [11.72844828]
 [15.63362069]
 [19.5387931 ]]
1.7495689655172406 0.9100015964240102
[[  6.  36.]
 [  8.  64.]
 [ 10. 100.]
 [ 14. 196.]
 [ 18. 324.]]
0.3568763326226019 0.9816421639597427
[[   6.   36.  216.]
 [   8.   64.  512.]
 [  10.  100. 1000.]
 [  14.  196. 2744.]
 [  18.  324. 5832.]]
0.10234962406015022 0.9947351016429964

Process finished with exit code 0

过拟合：

当拟合曲线过度拟合的时候，会使预测模型变得很差，这个时候需要对特征进行处理
	需要对特征进行正则化处理，对高次项的特征的权重逐渐设置小，
需要用到的api是：
	from sklearn.linear_model import Ridge
	需要对Ridge的几个参数进行设置
	1、aipha：正则化的力度，力度越大，高次项的权重越小，拟合曲线凹凸性越弱；取0-1小数或者1-10整数
	2、codf_：回归系数