线性回归分析对数据噪声特别敏感,当样本矩阵
X
X
存在多重共线性或者样本数没有远大于特征维度时,是不可逆的或接近不可逆的;此时利用线性回归公式求得的参数
θ
θ
很容易过拟合。如果能限制模型的复杂度,让
θ
θ
的参数值不至于变得非常大,模型对噪声的敏感度就会降低。这就是
Ridge
R
i
d
g
e
回归和
Lasso
L
a
s
s
o
回归的基本思想。而降低模型复杂度的一个有效方法是
L1正则化和L2
L
1
正
则
化
和
L
2
正则化。在这里,加上
L2
L
2
正则化的线性回归就是
Ridge
R
i
d
g
e
回归,而加上
L1
L
1
正则化的线性回归就是
Lasso
L
a
s
s
o
回归。
根据线性回归的损失函数加上
L2
L
2
正则化,我们可以得到Ridge回归的损失函数:
为了方便计算,我们可以令损失函数两边同乘以 N N ,并令,画简如下:
对 θ θ 求偏导数并令之为0可得:
解得: θ=(XTX+βI)−1XTY θ = ( X T X + β I ) − 1 X T Y ,这便是Ridge回归的公式解。
因为 Lasso L a s s o 回归的损失函数中 L1 L 1 正则化项没有固定导数,所以 Lasso L a s s o 回归只能通过梯度下降法来进行优化,不存在通解。
代码块
和线性回归的代码很类似,稍微修改就得到了Ridge回归和Lass回归的代码:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import scale
from random import random
class LassoRegression(object):
weight = np.array([])
def __init__(self):
return
def dif(self, W, c):#L1正则项求导
w = np.array(W)
w[W>0] = 1
w[W<0] = -1
return c*w
def gradientDescent(self, X, Y, alpha, epoch, c):
W = np.random.normal(0,1,size=(X.shape[1],))
for i in range(epoch):
W -= alpha*(X.T).dot(X.dot(W)-Y)/X.shape[0] + self.dif(W, c)
return W
def fit(self, train_data, train_target, alpha = 0.1, epoch = 300, c = 0.05):
X = np.ones((train_data.shape[0], train_data.shape[1]+1))
X[:,0:-1] = train_data
Y = train_target
self.weight = self.gradientDescent(X, Y, alpha, epoch, c)
def predict(self, test_data):
X = np.ones((test_data.shape[0], test_data.shape[1]+1))
X[:,0:-1] = test_data
return X.dot(self.weight)
def evaluate(self, predict_target, test_target):
predict_target[predict_target>=0.5] = 1
predict_target[predict_target<0.5] = 0
return sum(predict_target==test_target)/len(predict_target)
class RidgeRegression(object):
weight = np.array([])
way = 'gd'
def __init__(self, training_way = 'gd'):
self.way = training_way
def gradientDescent(self, X, Y, alpha, epoch, c):
W = np.random.normal(0,1,size=(X.shape[1],))
for i in range(epoch):
W -= alpha*(X.T).dot(X.dot(W)-Y)/X.shape[0] + c*W
return W
def fit(self, train_data, train_target, alpha = 0.1, epoch = 300, c = 0.05):
X = np.ones((train_data.shape[0], train_data.shape[1]+1))
X[:,0:-1] = train_data
Y = train_target
if self.way == 'gd':
self.weight = self.gradientDescent(X, Y, alpha, epoch, c)
else:
I = np.eye(X.shape[1])
self.weight = np.linalg.inv((X.T).dot(X)+c*I).dot(X.T).dot(Y)
def predict(self, test_data):
X = np.ones((test_data.shape[0], test_data.shape[1]+1))
X[:,0:-1] = test_data
return X.dot(self.weight)
def evaluate(self, predict_target, test_target):
predict_target[predict_target>=0.5] = 1
predict_target[predict_target<0.5] = 0
return sum(predict_target==test_target)/len(predict_target)
if __name__ == "__main__":
lasso = LassoRegression()
lasso.fit(train_data, train_target, 0.05, 1000, 0.01)
lassoPredict = lasso.predict(test_data)
print('lasso regression accruacy:',lasso.evaluate(lassoPredict,test_target))
ridge = RidgeRegression(training_way = 'gd')
ridge.fit(train_data, train_target, 0.05, 1000, 0.01)
ridgePredict = ridge.predict(test_data)
print('ridge regression accuracy:',ridge.evaluate(ridgePredict, test_target)