python和机器学习第八章多项式回归与模型泛化（二）scikit-learn中的多项式回归和Pipeline

最新推荐文章于 2023-12-26 18:03:36 发布

把小兔打哭

最新推荐文章于 2023-12-26 18:03:36 发布

阅读量306

点赞数

分类专栏： python与机器学习文章标签：机器学习 PIpeline

本文链接：https://blog.csdn.net/Dear_leslie/article/details/96144913

版权

python与机器学习专栏收录该内容

26 篇文章 0 订阅

订阅专栏

数据集只有一个特征

构造数据

In [92]: import numpy as np
    ...: import matplotlib.pyplot as plt
    
In [93]: x = np.random.uniform(-3,3,size=100)
    ...: X = x.reshape(-1,1)
    ...: y = 0.5 * x**2 + x + 2 + np.random.normal(0,1,size=100)

使用scikit-learn获得多项式特征

In [94]: from sklearn.preprocessing import PolynomialFeatures
#要为原有的数据集添加相对原有特征的几次幂
In [95]: poly = PolynomialFeatures(degree=2)
    ...: poly.fit(X)
    ...: X2 = poly.transform(X)

In [97]: X[:5,:]
Out[97]: 
array([[-2.19518176],
       [ 2.77476822],
       [ 0.48917013],
       [ 0.39238714],
       [-0.23027698]])
#得到的升维后的特征       
In [98]: X2[:5,:]
Out[98]: 
array([[ 1.        , -2.19518176,  4.81882296],
       [ 1.        ,  2.77476822,  7.69933869],
       [ 1.        ,  0.48917013,  0.23928742],
       [ 1.        ,  0.39238714,  0.15396766],
       [ 1.        , -0.23027698,  0.05302749]])

使用线性回归进行训练和预测，得到非线性的曲线

In [99]: from sklearn.linear_model import LinearRegression
    ...: lin_reg2 = LinearRegression()
    ...: lin_reg2.fit(X2,y)
    ...: y_predict2 = lin_reg2.predict(X2)
In [100]: plt.scatter(x,y)
     ...: plt.plot(np.sort(x),y_predict2[np.argsort(x)],color='r')
     
In [101]: lin_reg2.coef_
Out[101]: array([0.        , 0.99420228, 0.55194625])

In [102]: lin_reg2.intercept_
Out[102]: 1.8569459632147785

数据集有两个特征

如果有两个特征x1、x2，则会生成3列二次幂的特征x1^2,x2^2,x1*x2

In [103]: X = np.arange(1,11).reshape(-1,2)
In [104]: X.shape
Out[104]: (5, 2)
In [105]: X
Out[105]: 
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])
       
In [106]: poly = PolynomialFeatures(degree=2)
     ...: poly.fit(X)
     ...: X2 = poly.transform(X)
     
In [107]: X2.shape
Out[107]: (5, 6)
In [108]: X2
Out[108]: 
array([[  1.,   1.,   2.,   1.,   2.,   4.],
       [  1.,   3.,   4.,   9.,  12.,  16.],
       [  1.,   5.,   6.,  25.,  30.,  36.],
       [  1.,   7.,   8.,  49.,  56.,  64.],
       [  1.,   9.,  10.,  81.,  90., 100.]])

三次幂会生成10个特征

在这里插入图片描述

Pipeline

送给管道的数据会沿着管道中定义的三步依次进行下去

In [115]: from sklearn.preprocessing import StandardScaler
     ...: from sklearn.pipeline import Pipeline
     ...: poly_reg = Pipeline([
     ...:     ("poly",PolynomialFeatures(degree=2)),
     ...:     ("std_scaler",StandardScaler()),
     ...:     ("lin_reg",LinearRegression())
     ...: ])
In [116]: poly_reg.fit(X,y)
     ...: y_predict = poly_reg.predict(X)