机器学习—多项式回归

最新推荐文章于 2023-01-07 09:50:53 发布

BingLZg

最新推荐文章于 2023-01-07 09:50:53 发布

阅读量658

点赞数

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/Bing_bing_bing_/article/details/87374741

版权

7.多项式回归和模型选择

7.1多项式回归

7.1.1多项式回归的实现思想

在现实生活中，很多数据之间是非线性关系；虽然使用多线性回归来拟合非线性数据集，但是其拟合效果是非常的差。

#程序7-1

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(123456)

x = np.random.uniform(-5,5,size=150)

y = 1*(x**2) + 2*x + 3 + np.random.normal(0,3,size=150)

X = x.reshape(-1,1)

print(y.shape)

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(X,y)

y_predict = lin_reg.predict(X)

plt.scatter(X,y)

plt.plot(X,y_predict,color='r')

plt.show()

运行结果：

对于y = ax2 + bx + c，我们可以把x和x2各当做一个特征，然后使用线性回归的方法来求解。

#程序7-2

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(123456)

X = np.random.uniform(-5,5,size=150).reshape(-1,1)

#(列)向量和矩阵之间的运算，最好将向量也转化成矩阵再运算

#若矩阵是(150,1)，向量是(150,)，则相加得(150,150)

#因为将矩阵的每一行元素都加上向量

y = 1*(X**2) + 2*X + 3 + np.random.normal(0,3,size=150).reshape(-1,1)

print(y.shape)

from sklearn.linear_model import LinearRegression

X2 = np.hstack([X,X**2])

print(X2.shape)

lin_reg = LinearRegression()

lin_reg.fit(X2,y)

print(lin_reg.coef_)

print(lin_reg.intercept_)

y_predict = lin_reg.predict(X2)

plt.scatter(X,y)

#由于使用fancy indexing，其[]内不能是矩阵，因此使用reshape转换为向量

plt.plot(np.sort(X,axis=0), y_predict[np.argsort(X,axis=0).reshape(-1,)],color='r')

plt.show()

运行结果：

(150, 1)

(150, 2)

[[1.91743701 1.05227471]]

[2.67113733]

多项式回归本质上使用的还是多线性回归，不过在其样本的数据集上，增加了特征项。在满足y = ax2 + bx + c关系的数据集中，将x和x2当做特征，来求出系数b、a和截距c。

7.1.2使用sklearn来实现多项式回归

在sklearn库中，并没有封装多项式回归的具体算法；但是在sklearn.preprocessing中提供了PolynomialFeatures类，其作用是创建多项式特征。

在PolynomialFeatures中有参数degree，当degree = 2，表示创建2阶多项式特征。若数据集原有特征x1，则生成x10=1、x11、x12，即增加了2个特征；若原数据集有特征x1、x2，则生成1、x11、x21、x1*x2、x12、x22，即增加了4个特征。

#程序7-3

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(123456)

X = np.random.uniform(-5,5,size=150).reshape(-1,1)

#(列)向量和矩阵之间的运算，最好将向量也转化成矩阵再运算

#若矩阵是(150,1)，向量是(150,)，则相加得(150,150)

#因为将矩阵的每一行元素都加上向量

y = 1*(X**2) + 2*X + 3 + np.random.normal(0,3,size=150).reshape(-1,1)

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

pol_fea = PolynomialFeatures(degree=2)

pol_fea.fit(X)

X2 = pol_fea.transform(X)

print(X[:5])

print(X2[:5])

lin_reg = LinearRegression()

lin_reg.fit(X2,y)

print(lin_reg.coef_)

print(lin_reg.intercept_)

y_predict = lin_reg.predict(X2)

plt.scatter(X,y)

#由于使用fancy indexing，其[]内不能是矩阵，因此使用reshape转换为向量

plt.plot(np.sort(X,axis=0), y_predict[np.argsort(X,axis=0).reshape(-1,)],color='r')

plt.show()

运行结果：

[[-3.73030167]

[ 4.66717838]

[-2.39523994]

[ 3.97236524]

[-1.23250284]]

[[ 1. -3.73030167 13.91515055]

[ 1. 4.66717838 21.78255408]

[ 1. -2.39523994 5.73717438]

[ 1. 3.97236524 15.77968563]

[ 1. -1.23250284 1.51906325]]

[[0. 1.91743701 1.05227471]]

[2.67113733]

根据多项式的系数和截距可以得出y = 1.91743701x + 1.05227471x2 + 2.67113733，这与y=x2+2x+3及其相近。

7.1.3使用pipeline进行封装

在pipeline模块中，使用Pipeline将相同处理过程的类进行封装。

#程序7-4

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(123456)

X = np.random.uniform(-5,5,size=150).reshape(-1,1)

#(列)向量和矩阵之间的运算，最好将向量也转化成矩阵再运算

#若矩阵是(150,1)，向量是(150,)，则相加得(150,150)

#因为将矩阵的每一行元素都加上向量

y = 1*(X**2) + 2*X + 3 + np.random.normal(0,3,size=150).reshape(-1,1)

from sklearn.linea

最低0.47元/天解锁文章

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习—多项式回归

7.多项式回归和模型选择7.1多项式回归7.1.1多项式回归的实现思想在现实生活中，很多数据之间是非线性关系；虽然使用多线性回归来拟合非线性数据集，但是其拟合效果是非常的差。#程序7-1import numpy as npimport matplotlib.pyplot as plt np.random.seed(123456)x = np.random.uni...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。