机器学习实践系列(二)岭回归(Ridge Regression)

按照第一个教程里的普通线性回归我们在load_boston数据集上构建一个普通线性回归模型,我们看看会出现什么问题?
我们先来看看load_boston数据集

from sklearn.datasets import load_boston
boston = load_boston()
print(boston.data.shape)

You will see something similar to:

(506, 13)

表示这个数据集有13个特征,一共506个样本

from sklearn.datasets import load_boston
boston = load_boston()
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
X = boston.data
y = boston.target
num_training = int(0.7*len(X))

#分割数据集
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.LinearRegression()

#训练模型
reg.fit(X_train,y_train)

#预测模型
y_pred = reg.predict(X_test)

#输出模型参数
print("系数",reg.coef_)
print(reg.coef_.shape)
print("常数项",reg.intercept_)

#计算均方误差
print("在测试集均方误差",mean_squared_error(y_test,y_pred))

#计算r2值
print("r2值",r2_score(y_test,y_pred))

You will see something similar to:

系数 [ 1.29693856  0.01469497  0.04050457  0.79060732 -9.12933243  9.24839787
 -0.0451214  -0.91395374  0.14079658 -0.01477291 -0.63369567  0.01577172
 -0.09514128]
(13,)
常数项 -13.6721465522
在测试集均方误差 545.445002115
r2值 -7.2211853282

这个线性回归模型有13个参数,一个常数项
Ridge regression addresses some of the problems of Ordinary Least Squares(i.e overfiting 过拟合) by imposing a penalty on the size of coefficients.(正则化)
当样本特征较多时,而样本数量相对较少时,普通线性回归很容易陷入过拟合,为了解决这个问题岭回归通过引入L2正则化来降低过拟合风险.

from sklearn.datasets import load_boston
boston = load_boston()
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
X = boston.data
y = boston.target
num_training = int(0.7*len(X))

#分割数据集
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.Ridge(alpha = .5)

#训练模型
reg.fit(X_train,y_train)

#预测模型
y_pred = reg.predict(X_test)

#输出模型参数
print("系数",reg.coef_)
print(reg.coef_.shape)
print("常数项",reg.intercept_)

#计算均方误差
print("在测试集均方误差",mean_squared_error(y_test,y_pred))

#计算r2值
print("r2值",r2_score(y_test,y_pred))

You will see something similar to:

系数 [ 1.06913232  0.01534766  0.03083921  0.81470562 -5.44619698  9.22075685
 -0.04681829 -0.86607139  0.13700694 -0.01498462 -0.60960326  0.01610884
 -0.10287555]
(13,)
常数项 -15.7171489971
在测试集均方误差 398.766928585
r2值 -5.01038933338

可以看到,岭回归确实能减小过拟合的风险.同样的数据集上,岭回归的误差小于普通线性回归.
使用交叉验证来选择参数

from sklearn import linear_model
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
boston = load_boston()
X = boston.data
y = boston.target
num_training = int(0.7*len(X))
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])
reg.fit(X_train,y_train)
print("最优alpha:",reg.alpha_)
y_pred = reg.predict(X_test)
print("在测试集均方误差",mean_squared_error(y_test,y_pred))

You will see something similar to:

最优alpha: 0.1
在测试集均方误差 499.704726051

Reference
http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Ridge回归是一种机器学习算法,它是线性回归的一种变种。它在标准线性回归的基础上加入了一个正则化项,以防止过拟合。Ridge回归的核心思想是通过最小化代价函数来找到最优的回归系数。代价函数由两部分组成,一部分是均方误差,用来衡量预测值与实际值之间的差距;另一部分是正则化项,用来控制回归系数的大小。正则化项中的参数λ决定了正则化的程度,越大则对回归系数的限制越严格。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [机器学习算法-线性回归、Lasso回归、Ridge回归算法python实现](https://download.csdn.net/download/LYQZDX/87921627)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 33.333333333333336%"] - *2* [机器学习算法系列(四)- 岭回归算法(Ridge Regression Algorithm)](https://blog.csdn.net/sai_simon/article/details/122337097)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 33.333333333333336%"] - *3* [机器学习算法系列篇9:Lasso 和 Ridge回归算法](https://blog.csdn.net/robot_learner/article/details/103942849)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 33.333333333333336%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值