同上一个教程中的岭回归(Ridge Regression)一样,LASSO回归也是为了”对付”普通线性回归的过拟合(Overfitting)问题,于岭回归不同的是LASSO回归采用了L1范式作为惩罚项,同样,有惩罚因子.
from sklearn.datasets import load_boston
boston = load_boston()
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,r2_score
X = boston.data
y = boston.target
num_training = int(0.7*len(X))
#分割数据集
X_train = X[:num_training]
y_train = y[:num_training]
X_test = X[num_training:]
y_test = y[num_training:]
reg = linear_model.Lasso(alpha=0.1)
#训练模型
reg.fit(X_train,y_train)
#预测模型
y_pred = reg.predict(X_test)
#输出模型参数
print("系数",reg.coef_)
print(reg.coef_.shape)
print("常数项",reg.intercept_)
#计算均方误差
print("在测试集均方误差",mean_squared_error(y_test,y_pred))
#计算r2值
print("r2值",r2_score(y_test,y_pred))
You will see something similar to:
系数 [ 0.24788545 0.01585505 0.02622181 0. -0. 8.91066445
-0.04367892 -0.78166146 0.12121806 -0.01507666 -0.62126143 0.01410457
-0.12861912]
(13,)
常数项 -15.3209262151
在测试集均方误差 99.2877306308
r2值 -0.496508046032
同样,你也可以通过交叉验证来选择参数.
L1范数于L2范数都有助于降低过拟合的风险,但L1范数有一个潜在的好处,其解更稀疏,i.e.,有更多的非零分量.
Reference
http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html#sklearn.datasets.load_boston
https://en.wikipedia.org/wiki/Lasso_(statistics)