机器学习实践系列(二)岭回归(Ridge Regression)

最新推荐文章于 2023-02-25 16:32:39 发布

BlvinDon

最新推荐文章于 2023-02-25 16:32:39 发布

阅读量2.1k

点赞数 1

分类专栏：机器学习实践系列文章标签：岭回归 Ridge Regression 机器学习线性模型

本文链接：https://blog.csdn.net/FLORIDA_tang/article/details/79154726

版权

机器学习实践系列专栏收录该内容

3 篇文章 0 订阅

订阅专栏

按照第一个教程里的普通线性回归我们在load_boston数据集上构建一个普通线性回归模型,我们看看会出现什么问题?
我们先来看看load_boston数据集

from sklearn.datasets import load_boston
boston = load_boston()
print(boston.data.shape)

You will see something similar to:

(506, 13)

表示这个数据集有13个特征,一共506个样本

from sklearn.datasets import load_boston
boston = load_boston()
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
X = boston.data
y = boston.target
num_training = int(0.7*len(X))

#分割数据集
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.LinearRegression()

#训练模型
reg.fit(X_train,y_train)

#预测模型
y_pred = reg.predict(X_test)

#输出模型参数
print("系数",reg.coef_)
print(reg.coef_.shape)
print("常数项",reg.intercept_)

#计算均方误差
print("在测试集均方误差",mean_squared_error(y_test,y_pred))

#计算r2值
print("r2值",r2_score(y_test,y_pred))

You will see something similar to:

系数 [ 1.29693856  0.01469497  0.04050457  0.79060732 -9.12933243  9.24839787
 -0.0451214  -0.91395374  0.14079658 -0.01477291 -0.63369567  0.01577172
 -0.09514128]
(13,)
常数项 -13.6721465522
在测试集均方误差 545.445002115
r2值 -7.2211853282

这个线性回归模型有13个参数,一个常数项
Ridge regression addresses some of the problems of Ordinary Least Squares(i.e overfiting 过拟合) by imposing a penalty on the size of coefficients.(正则化)
当样本特征较多时,而样本数量相对较少时,普通线性回归很容易陷入过拟合,为了解决这个问题岭回归通过引入L2正则化来降低过拟合风险.

from sklearn.datasets import load_boston
boston = load_boston()
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
X = boston.data
y = boston.target
num_training = int(0.7*len(X))

#分割数据集
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.Ridge(alpha = .5)

#训练模型
reg.fit(X_train,y_train)

#预测模型
y_pred = reg.predict(X_test)

#输出模型参数
print("系数",reg.coef_)
print(reg.coef_.shape)
print("常数项",reg.intercept_)

#计算均方误差
print("在测试集均方误差",mean_squared_error(y_test,y_pred))

#计算r2值
print("r2值",r2_score(y_test,y_pred))

You will see something similar to:

系数 [ 1.06913232  0.01534766  0.03083921  0.81470562 -5.44619698  9.22075685
 -0.04681829 -0.86607139  0.13700694 -0.01498462 -0.60960326  0.01610884
 -0.10287555]
(13,)
常数项 -15.7171489971
在测试集均方误差 398.766928585
r2值 -5.01038933338

可以看到,岭回归确实能减小过拟合的风险.同样的数据集上,岭回归的误差小于普通线性回归.
使用交叉验证来选择参数

from sklearn import linear_model
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
boston = load_boston()
X = boston.data
y = boston.target
num_training = int(0.7*len(X))
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])
reg.fit(X_train,y_train)
print("最优alpha:",reg.alpha_)
y_pred = reg.predict(X_test)
print("在测试集均方误差",mean_squared_error(y_test,y_pred))

You will see something similar to:

最优alpha: 0.1
在测试集均方误差 499.704726051

Reference
http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets

BlvinDon

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
机器学习实践系列(二)岭回归(Ridge Regression)

按照第一个教程里的普通线性回归我们在load_boston数据集上构建一个普通线性回归模型,我们看看会出现什么问题? 我们先来看看load_boston数据集from sklearn.datasets import load_bostonboston = load_boston()print(boston.data.shape)You will see something sim
复制链接

扫一扫