10. 机器学习——带正则化的岭回归(笔记)

1、过拟合与欠拟合

问题:训练数据训练的很好啊,误差也不大,为什么在测试集上面有问题呢?

欠拟合:
在这里插入图片描述

过拟合在这里插入图片描述

分析上图1
经过训练后,知道了天鹅是有翅膀的,天鹅的嘴巴是长长的。简单的认为有这些特征的都是天鹅。因为机器学习到的天鹅特征太少了,导致区分标准太粗糙,不能准确识别出天鹅。

分析上图2
机器通过这些图片来学习天鹅的特征,经过训练后,知道了天鹅是有翅膀的,天鹅的嘴巴是长长的弯曲的,天鹅的脖子是长长的有点曲度,天鹅的整个体型像一个"2"且略大于鸭子。这时候机器已经基本能区别天鹅和其他动物了。然后,很不巧已有的天鹅图片全是白天鹅的,于是机器经过学习后,会认为天鹅的羽毛都是白的,以后看到羽毛是黑的天鹅就会认为那不是天鹅

在这里插入图片描述

在这里插入图片描述

对线性模型进行训练学习会变成复杂模型(也就是线性变成非线性,直线变成曲线)
在这里插入图片描述
上图中第一个明显是欠拟合,第三个是过拟合,而第二个是刚刚好的状态

1.1 欠拟合原因以及解决办法

• 原因:
• 学习到数据的特征过少

• 解决办法:
• 增加数据的特征数量

1.2 过拟合原因以及解决办法

• 原因:
• 原始特征过多,存在一些嘈杂特征,模型过于复杂是因为模型尝试去兼顾各个测试数据点

• 解决办法:
• 进行特征选择,消除关联性大的特征(很难做)
• 交叉验证(让所有数据都有过训练)(只是检验出来是否过拟合)
• 正则化(了解)

在这里插入图片描述

1.3 L2正则化

在这里插入图片描述

2、 带有正则化的线性回归-Ridge

• sklearn.linear_model.Ridge

在这里插入图片描述

观察正则化程度的变化,对结果的影响?
在这里插入图片描述

2.1 线性回归 LinearRegression与Ridge对比

代码:

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression,SGDRegressor,Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

#线性回归预测房价
#获取数据
lb = load_boston()

#分割数据集到训练集和测试集
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.25)

print(y_test.shape) #一维的
#进行标准化处理(特征值和目标值都需要标准化处理)
std_x = StandardScaler()

x_train = std_x.fit_transform(x_train)
x_test = std_x.transform(x_test)

#目标值
std_y = StandardScaler()  

y_train = std_y.fit_transform(y_train.reshape(-1,1))   #sklearn 0.19之后必须要求穿进去的数组是二维的

y_test = std_y.transform(y_test.reshape(-1,1))

# estimator预测
# 正规方程求解方式预测结果
lr = LinearRegression()
lr.fit(x_train,y_train)
print(lr.coef_)
#预测测试集房子的价格
y_predict = lr.predict(x_test)
y_predict = std_y.inverse_transform(y_predict) #反标准化
print('房子的价格',y_predict)

print("正规方程的均方误差:", mean_squared_error(std_y.inverse_transform(y_test), y_predict))

# # 梯度下降去进行房价预测
sgd = SGDRegressor()
sgd.fit(x_train, y_train)
    
print(sgd.coef_)
    
# 预测测试集的房子价格
y_sgd_predict = std_y.inverse_transform(sgd.predict(x_test))

print("梯度下降测试集里面每个房子的预测价格:", y_sgd_predict)
print("梯度下降的均方误差:", mean_squared_error(std_y.inverse_transform(y_test), y_sgd_predict))


#岭回归
rd = Ridge()
rd.fit(x_train,y_train)
print(rd.coef_)
#预测测试集房子的价格
y_predict = rd.predict(x_test)
y_predict = std_y.inverse_transform(y_predict) #反标准化
print('房子的价格',y_predict)

print("岭回归的均方误差:", mean_squared_error(std_y.inverse_transform(y_test), y_predict))

结果

(127,)
[[-0.10567608  0.12800059 -0.02081012  0.08019896 -0.20419646  0.26203745
  -0.0143011  -0.34425272  0.32636026 -0.24074745 -0.21175963  0.08813659
  -0.40618423]]
房子的价格 [[14.77076723]
 [13.27831203]
 [16.60033542]
 [25.26172339]
 [20.62467259]
 [ 6.16434782]
 [13.28311046]
 [22.87709636]
 [17.31795715]
 [18.66015137]
 [21.0218563 ]
 [24.84294609]
 [20.81495002]
 [13.43407714]
 [19.57565738]
 [19.3089579 ]
 [22.6531201 ]
 [35.55047995]
 [28.65242938]
 [23.23185524]
 [21.56498689]
 [32.01154171]
 [18.65673742]
 [22.35284797]
 [25.31174824]
 [30.75292852]
 [34.37896357]
 [37.47214832]
 [ 6.73466295]
 [35.32260072]
 [17.29205635]
 [24.04981707]
 [26.49537869]
 [19.39206106]
 [20.30202713]
 [23.78966036]
 [17.51792953]
 [17.11736623]
 [15.6918776 ]
 [27.67409696]
 [37.15818615]
 [26.99110797]
 [34.99126767]
 [14.76681021]
 [34.43555452]
 [16.52073494]
 [21.84037719]
 [16.68374204]
 [ 2.41539153]
 [14.31783444]
 [28.6838572 ]
 [15.57163765]
 [21.21643664]
 [25.27575566]
 [25.37553741]
 [20.83692649]
 [17.85187079]
 [24.64831743]
 [15.73134754]
 [42.77785003]
 [20.40416974]
 [33.31204004]
 [38.85798562]
 [11.48432283]
 [27.76057343]
 [12.06166884]
 [20.86579503]
 [27.24237201]
 [19.61373115]
 [24.52191849]
 [31.33421453]
 [26.77816914]
 [16.75660085]
 [34.50518995]
 [25.14802216]
 [25.90921364]
 [21.57533282]
 [ 8.48949894]
 [21.59651502]
 [ 9.69458833]
 [22.22350557]
 [24.57534372]
 [24.28353498]
 [34.33310039]
 [29.96124903]
 [14.70893927]
 [22.30509421]
 [19.22276018]
 [ 2.96171808]
 [35.76179714]
 [12.00840098]
 [15.11632841]
 [13.23431461]
 [13.9356148 ]
 [18.72450404]
 [32.50044844]
 [31.9595019 ]
 [21.6475679 ]
 [18.26864955]
 [22.91285212]
 [19.22742578]
 [ 4.15097764]
 [32.40218537]
 [19.73426363]
 [13.61425393]
 [32.71529794]
 [19.9715399 ]
 [23.94900489]
 [32.38243401]
 [32.10231901]
 [31.68013488]
 [16.64492959]
 [25.67337496]
 [17.50075022]
 [23.06085794]
 [17.06145436]
 [17.07135798]
 [29.03759939]
 [26.82782037]
 [34.1179701 ]
 [17.72120486]
 [29.36134957]
 [13.47418262]
 [25.43552433]
 [18.74529461]
 [26.64585416]
 [42.28879618]]
正规方程的均方误差: 19.943775280232305
[-0.08193998  0.07953498 -0.07694884  0.09692202 -0.13142005  0.30038964
 -0.01906175 -0.27527616  0.16869863 -0.0634908  -0.20211355  0.09301433
 -0.39414702]
梯度下降测试集里面每个房子的预测价格: [15.39218891 13.52798063 16.47874323 25.06860083 20.22602808  6.26771315
 13.43347791 23.51282623 17.74938116 18.77260822 21.03522778 24.72813256
 21.64328198 13.28995688 20.15220852 19.65319754 23.8877247  36.02024967
 27.69980762 24.15177507 20.92014599 31.92950466 19.17057437 22.73044775
 25.49973931 31.39313354 33.30013821 37.49444783  6.62350093 35.51710287
 17.58961995 24.05900694 25.99117215 19.93798484 20.70995703 23.21038327
 17.52540607 17.99853656 16.16881702 26.95125066 37.49791199 27.0166394
 35.22614528 15.35325779 32.80594904 17.07976506 24.6797388  17.13262971
  1.50530288 14.55552582 29.26293297 13.29204548 18.91253258 25.38577115
 25.51541994 21.15984203 17.56395625 24.556492   16.5073547  41.09975136
 20.31642047 32.10630265 38.6040282  11.68399041 28.03813137 11.54805457
 21.0233875  26.32102522 19.2644227  24.12210071 30.72340628 26.53815011
 16.64108217 35.16727989 25.11923725 25.22226935 21.21523941  8.52212096
 21.10075899 10.12849153 24.10399226 24.26465393 23.77298872 34.51866174
 30.94224501 14.85166827 21.9526793  19.78448065  5.01258647 35.92316777
 12.98575522 14.25323648 13.39858078 11.51172671 19.10362107 32.6881197
 31.55462351 19.36427109 18.83545012 22.93005441 19.77607529  3.77376201
 32.519765   17.36803532 13.82339068 32.06590057 21.23638127 23.44861165
 32.36442134 30.23762236 32.14463003 16.58802876 25.55767847 19.130649
 22.67014251 16.77307442 16.98033589 28.80486746 26.48238797 33.26111273
 17.92842635 29.21141036 15.86024054 24.7808194  18.21191731 26.26559592
 42.72435392]
梯度下降的均方误差: 20.404166344036355
[[-0.10424982  0.12529391 -0.02529284  0.08079958 -0.20029906  0.26336097
  -0.01482654 -0.33993608  0.31510062 -0.22903615 -0.21089159  0.08823867
  -0.40448616]]
房子的价格 [[14.78924732]
 [13.30010965]
 [16.57696075]
 [25.24707526]
 [20.61309215]
 [ 6.17940565]
 [13.28115626]
 [22.87471525]
 [17.31990407]
 [18.68278409]
 [21.03853641]
 [24.83759427]
 [20.82609359]
 [13.41756242]
 [19.57966149]
 [19.33891951]
 [22.68080588]
 [35.53031648]
 [28.60084297]
 [23.29566985]
 [21.5399288 ]
 [31.97179546]
 [18.69967699]
 [22.38224565]
 [25.33065901]
 [30.81274563]
 [34.32007772]
 [37.43897018]
 [ 6.73579572]
 [35.28928948]
 [17.28658948]
 [24.05025822]
 [26.44959675]
 [19.39504212]
 [20.29661989]
 [23.77114472]
 [17.53343315]
 [17.15705308]
 [15.70188588]
 [27.6129232 ]
 [37.14202271]
 [27.00277016]
 [34.9780761 ]
 [14.78041933]
 [34.33971773]
 [16.54984676]
 [22.03162035]
 [16.68645277]
 [ 2.38729393]
 [14.35814778]
 [28.72701499]
 [15.43021136]
 [21.06146923]
 [25.28772018]
 [25.34809906]
 [20.86630923]
 [17.8549852 ]
 [24.66236145]
 [15.76838931]
 [42.6644899 ]
 [20.37563767]
 [33.24864358]
 [38.81903503]
 [11.52924678]
 [27.78827352]
 [12.02936785]
 [20.88694035]
 [27.20015279]
 [19.60763795]
 [24.5060948 ]
 [31.28905906]
 [26.72864739]
 [16.77370858]
 [34.53611675]
 [25.14759987]
 [25.87097919]
 [21.53370603]
 [ 8.49458801]
 [21.54398626]
 [ 9.71478109]
 [22.3594711 ]
 [24.54208078]
 [24.2675291 ]
 [34.35532187]
 [30.0270077 ]
 [14.72148164]
 [22.30424844]
 [19.22587415]
 [ 3.11957867]
 [35.77944566]
 [12.06742078]
 [15.06110648]
 [13.2325652 ]
 [13.79110675]
 [18.72137177]
 [32.45196325]
 [31.91784709]
 [21.49110079]
 [18.27479433]
 [22.92200478]
 [19.24883539]
 [ 4.14542685]
 [32.42578572]
 [19.57750059]
 [13.63308605]
 [32.66999885]
 [20.00646844]
 [23.93030843]
 [32.39244989]
 [32.00070472]
 [31.70563306]
 [16.66581832]
 [25.64800136]
 [17.6386729 ]
 [23.04127692]
 [17.05655024]
 [17.05296158]
 [29.02411803]
 [26.81445658]
 [34.05924831]
 [17.74815383]
 [29.37237632]
 [13.58364839]
 [25.39631465]
 [18.6970774 ]
 [26.61423726]
 [42.27200545]]
岭回归的均方误差: 19.958165933908848

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值