基于之前逻辑回归和梯度下降的代码进行改造
- 新增内容
优化了新建函数对象时输出内容的逻辑规则
加入Lasso和Ridge回归, 即L1正则化和L2正则化(L1采用了坐标轴下降算法)
加入正态分布的pdf和cdf图像绘制功能
绘制分布图像的同时可以给出相应函数值
对坐标轴设定不适宜和方差与均值差异较大的情况进行警告
函数图像的绘制可以更方便的进行自定义了
将线性回归的默认解法设定为了正规方程法, 因为这在高纬度时比简单梯度下降算法更可靠 - Ridge回归的推导
假设样本量为 N N N, 有 p p p个指标, 则有参数估计
β ^ r i d g e = arg min β { ∑ i = 1 N ( y i − β 0 − ∑ j = 1 p x i j β j ) 2 + λ ∑ j = 1 p β j 2 } \hat{\beta}^{\rm{ridge}}=\argmin_{\beta}\left\{\sum_{i=1}^N(y_i-\beta_0-\sum_{j=1}^px_{ij}\beta_j)^2+\lambda\sum_{j=1}^p\beta_j^2\right\} β^ridge=βargmin{i=1∑N(yi−β0−j=1∑pxijβj)2+λj=1∑pβj2}
等价于
β ^ r i d g e = arg min β ∑ i = 1 N ( y i − β 0 − ∑ j = 1 p x i j β j ) 2 , subject to ∑ j = i 1 p β j 2 ≤ t \hat{\beta}^{\rm{ridge}}=\argmin_{\beta}\sum_{i=1}^N(y_i-\beta_0-\sum_{j=1}^p x_{ij}\beta_j)^2, \\ \text{subject to}\sum_{j=i1}^p\beta_j^2\leq t β^ridge=βargmini=1∑N(yi−β0−j=1∑pxijβj)2,subject toj=i1∑pβj2≤t
那么矩阵形式下的损失函数为
R S S ( λ ) = ( y − X β ) ⊤ ( y − X β ) + λ β ⊤ β = ( y ⊤ − β ⊤ X ⊤ ) ( y − X β ) + λ β ⊤ β = y ⊤ y − y ⊤ X β − β ⊤ X ⊤ y − β ⊤ X ⊤ X β + λ β ⊤ β = y ⊤ y − 2 β ⊤ X ⊤ y − β ⊤ X ⊤ X β + λ β ⊤ β \begin{aligned} \rm{RSS}(\lambda)&=(\bold{y}-\bold{X}\beta)^{\top}(\bold{y}-\bold{X}\beta) + \lambda\beta^{\top}\beta \\ &=(\bold{y}^{\top}-\beta^{\top}\bold{X}^{\top})(\bold{y}-\bold{X}\beta) + \lambda\beta^{\top}\beta \\ &=\bold{y}^{\top}\bold{y}-\bold{y}^{\top}\bold{X}\beta-\beta^{\top}\bold{X}^{\top}\bold{y}-\beta^{\top}\bold{X}^{\top}\bold{X}\beta+ \lambda\beta^{\top}\beta \\ &=\bold{y}^{\top}\bold{y}-2\beta^{\top}\bold{X}^{\top}\bold{y}-\beta^{\top}\bold{X}^{\top}\bold{X}\beta+ \lambda\beta^{\top}\beta \end{aligned} RSS(λ)=(y−Xβ)⊤(y−Xβ)+λβ⊤β=(y⊤−β⊤X⊤)(y−Xβ)+λβ⊤β=y⊤y−y⊤Xβ−β⊤X⊤y−β⊤X⊤Xβ+λβ⊤β=y⊤y−2β⊤X⊤y−β⊤X⊤Xβ+λβ⊤β
最小化损失函数
∂ R S S ( λ ) ∂ β = 0 − 2 X ⊤ y − 2 X ⊤ X β + 2 λ β \begin{aligned} \frac{\partial\rm{RSS}(\lambda)}{\partial\beta}=0-2\bold{X}^{\top}\bold{y}-2\bold{X}^{\top}\bold{X}\beta+2\lambda\beta \end{aligned} ∂β∂RSS(λ)=0−2X⊤y−2X⊤Xβ+2λβ
令导数等于零
X ⊤ y + X ⊤ X β = λ β β ^ r i d g e = ( X ⊤ X + λ I ) − 1 X ⊤ y \begin{aligned} \bold{X}^{\top}\bold{y}+\bold{X}^{\top}\bold{X}\beta&=\lambda\beta \\ \hat{\beta}^{\rm{ridge}}&=(\bold{X}^{\top}\bold{X}+\lambda\bold{I})^{-1}\bold{X}^{\top}\bold{y} \end{aligned} X⊤y+X⊤Xββ^ridge=λβ=(X⊤X+λI)−1X⊤y
其中 I \bold{I} I是 p × p p \times p p×p的单位阵( identity matrix \text{identity matrix} identity matrix) - Lasso回归推导
β ^ l a s s o = arg min β ∑ i = 1 N ( y i − β 0 − ∑ j = 1 p x i j β j ) 2 subject to ∑ j = 1 p ∣ β j ∣ ≤ t \hat{\beta}^{\rm{lasso}}=\argmin_{\beta}\sum_{i=1}^N(y_i-\beta_0-\sum_{j=1}^p x_{ij}\beta_j)^2\\ \text{subject to}\sum_{j=1}^p|\beta_j|\leq t β^lasso=βargmini=1∑N(yi−β0−j=1∑pxijβj)2subject toj=1∑p∣βj∣≤t
等价于
β ^ l a s s o = arg min β { 1 2 ∑ i = 1 N ( y i − β 0 − ∑ j = 1 p x i j β j ) 2 + λ ∑ j = 1 p ∣ β j ∣ } = arg min β { 线性RSS ( β ) + 正 则 化 项 } \begin{aligned} \hat{\beta}^{\rm{lasso}}&=\argmin_{\beta}\left\{\frac{1}{2}\sum_{i=1}^N(y_i-\beta_0-\sum_{j=1}^p x_{ij}\beta_j)^2+\lambda\sum_{j=1}^p|\beta_j|\right\} \\ &=\argmin_{\beta}\left\{\text{线性RSS}(\beta) + 正则化项\right\} \end{aligned} β^lasso=βargmin{21i=1∑N(yi−β0−j=1∑pxijβj)2+λj=1∑p∣βj∣}=βargmin{线性RSS(β)+正则化项}
分段求偏导, 由于使用了坐标轴下降算法, 所以一次只能对一个维度求导, 假设现在对第 k k k个维度求导
∂ R S S ( β ) ∂ β k = ∑ i = 1 N − x i k ( y i − ∑ j = 1 p x i j β j ) = ∑ i = 1 N − x i k ( y i − ∑ j ≠ k p x i j β j − x i k β k ) = ∑ i = 1 N − x i k ( y i − ∑ j ≠ k p x i j β j ) + β k ∑ i = 1 N x i k 2 \begin{aligned} \frac{\partial\rm{RSS}(\beta)}{\partial \beta_k}&=\sum_{i=1}^N-x_{ik} (y_i-\sum_{j=1}^px_{ij}\beta_j) \\ &=\sum_{i=1}^N-x_{ik}(y_i-\sum_{j\neq k}^px_{ij}\beta_j-x_{ik}\beta_k) \\ &=\sum_{i=1}^N-x_{ik}(y_i-\sum_{j\neq k}^px_{ij}\beta_j)+\beta_k\sum_{i=1}^Nx_{ik}^2 \end{aligned} ∂βk∂RSS(β)=i=1∑N−xik(yi−j=1∑pxijβj)=i=1∑N−xik(yi−j=k∑pxijβj−xikβk)=i=1∑N−xik(yi−j=k∑pxijβj)+βki=1∑Nxik2
为方便计算和表示,记 ρ k = ∑ i = 1 N x i k ( y i − ∑ j ≠ k p x i j β j ) \rho_k=\sum_{i=1}^Nx_{ik}(y_i-\sum_{j\neq k}^px_{ij}\beta_j) ρk=∑i=1Nxik(yi−∑j=kpxijβj), z k = ∑ i = 1 N x i k 2 z_k= \sum_{i=1}^Nx_{ik}^2 zk=∑i=1Nxik2
则上式转化为:
∂ R S S ( β ) ∂ β k = − ρ k + β k z k \frac{\partial\rm{RSS}(\beta)}{\partial \beta_k}=-\rho_k+\beta_kz_k ∂βk∂RSS(β)=−ρk+βkzk
正则化项求导
∂ 正 则 化 项 ∂ β k = { − λ , β k < 0 [ − λ , λ ] , β k = 0 λ , β k > 0 \begin{aligned} \frac{\partial正则化项}{\partial\beta_k}&= \begin{cases} -\lambda, & \beta_k<0 \\ [-\lambda,\lambda], &\beta_k=0 \\ \lambda,& \beta_k >0 \end{cases} \end{aligned} ∂βk∂正则化项=⎩⎪⎨⎪⎧−λ,[−λ,λ],λ,βk<0βk=0βk>0
链接两部分导数
∂ L ( β ) ∂ β k = − ρ k + β k z k + { − λ , β k < 0 [ − λ , λ ] , β k = 0 λ , β k > 0 \frac{\partial L(\beta)}{\partial\beta_k}=-\rho_k+\beta_kz_k+ \begin{cases} -\lambda, & \beta_k<0 \\ [-\lambda,\lambda], &\beta_k=0 \\ \lambda,& \beta_k >0 \end{cases} ∂βk∂L(β)=−ρk+βkzk+⎩⎪⎨⎪⎧−λ,[−λ,λ],λ,βk<0βk=0βk>0
令第 k k k维导数等于零得到 β k \beta_k βk的解, 但需要分情况讨论
β k < 0 \beta_k<0 βk<0时, 上式等于 β k z k − ρ k − λ = 0 → β k z k = ρ k + λ \beta_kz_k-\rho_k-\lambda=0\to\beta_kz_k=\rho_k+\lambda βkzk−ρk−λ=0→βkzk=ρk+λ, 左边为负, 那么右边也得为负才能满足等式, 所以 ρ k + λ < 0 → ρ k < − λ \rho_k+\lambda<0\to\rho_k<-\lambda ρk+λ<0→ρk<−λ.
β k > 0 \beta_k>0 βk>0时, 上式左边为正, 则右边需要满足 ρ k − λ > 0 → ρ k > λ \rho_k-\lambda>0\to\rho_k>\lambda ρk−λ>0→ρk>λ.
β k = 0 \beta_k=0 βk=0时, 需要满足 ρ k = [ − λ , λ ] \rho_k=[-\lambda,\lambda] ρk=[−λ,λ], 显然无法满足这种需求. 综上所述
β k ^ l a s s o = { ρ k − λ z k ρ k > λ 0 − λ < ρ k < λ ρ k + λ z k ρ k < − λ \hat{\beta_k}^{\rm{lasso}}= \begin{cases} \frac{\ \rho_k-\lambda}{z_k} & \rho_k >\lambda\\ 0& -\lambda<\rho_k<\lambda \\ \frac{\ \rho_k+\lambda}{z_k} & \rho_k<-\lambda \\ \end{cases} βk^lasso=⎩⎪⎨⎪⎧zk ρk−λ0zk ρk+λρk>λ−λ<ρk<λρk<−λ
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import sys
from scipy.stats import norm
class LinearRegression:
def __init__(self,
learning_rate=0.001,
max_iter=100,
solver="NE",
error=1e-5,
penal=None,
alpha=None): #设定模型参数
print("------线性回归代码测试------")
self.learning_rate = learning_rate
self.max_iter = max_iter
self.solver = solver
self.error = error
self.penal = penal
self.alpha = alpha
solver_list = ['GD', 'NE']
print(
"Linear Regression: Setting(learning_rate={0}, max_iter={1}, solver='{2}', eps={3}, penal={4}, alpha={5})"
.format(learning_rate, max_iter, self.solver, self.error,
self.penal, self.alpha))
def norm_pdf(self, x=None, mu=0, sigma=1, plot=True, cul=None): #绘制概率密度函数
if type(x) == np.ndarray: #当输入为np,ndarray时, 按用户输入为准
self.x = x
elif x == None: # 不设定参数时执行默认参数
self.x = np.arange(-5, 5, 0.01)
else: # 告知不支持其他类型的输入
print("Class Error: x轴坐标类型错误: {0}".format(x))
sys.exit() #输入其他类型时强制结束
f = 1 / np.sqrt(2 * np.pi) * sigma * np.exp(
-(self.x - mu)**2 / 2 * pow(sigma, 2))
if plot == True:
plt.ylabel(f'$f(x)$')
plt.xlabel(f"$x$")
if min(self.x) < mu < max(
self.x): #均值落入作图区间范围时才标注出, 否则绘图范围不够均值显示异常
plt.vlines(mu, 0, max(f), linestyles="dashed")
else:
print("Warring: 设定均值在绘图范围外!") #条件不满足时告知用户
if mu != 0 and sigma >= 5 * mu:
print("Warring: 方差设定过大!")
plt.title("pdf")
plt.plot(self.x, f)
plt.show()
if cul != None:
f = 1 / np.sqrt(2 * np.pi) * sigma * np.exp(
-(cul - mu)**2 / 2 * pow(sigma, 2))
print(f) #两个结果相同, 前者不需要调包
print(norm.pdf(cul, mu, sigma))
def norm_cdf(self, x=None, mu=0, sigma=1, plot=True, cul=None): #绘制累计分布函数
if type(x) == np.ndarray: #当输入为np,ndarray时, 按用户输入为准
self.x = x
elif x == None: # 不设定参数时执行默认参数
self.x = np.linspace(-5, 5, 50)
else: # 告知不支持其他类型的输入
print("Class Error: x轴坐标类型错误: {0}".format(x))
sys.exit() #输入其他类型时强制结束
F = norm.cdf(self.x, mu, sigma)
if plot == True:
plt.ylabel(f'$F(x)$')
plt.xlabel(f"$x$")
plt.vlines(0, 0, 1, linestyles="dashed")
plt.title("cdf")
plt.plot(self.x, F)
plt.show()
if cul != None:
print(norm.cdf(cul, mu, sigma))
def fit(self, X, Y, beta=None, plot=None): #训练模型
x = np.array(X)
y = np.array(Y)
if beta == None:
beta = np.zeros(x.shape[1])
self.coef_ = beta
n = len(y)
Loss = [0]
if self.solver == "GD": #梯度下降法
for i in range(self.max_iter):
hypothesis = np.dot(x, self.coef_)
error = hypothesis - y
grad = np.dot(error, x) / n
self.coef_ = self.coef_ - self.learning_rate * grad
print(self.coef_)
Loss.append(error.sum())
loss = abs(Loss[-1] - Loss[-2])
if loss <= self.error:
break
elif len(Loss) == self.max_iter and loss > self.error:
print('Warring:迭代次数内损失函数未收敛')
if plot == True and self.solver == "GD":
plt.xlabel("iter")
plt.ylabel("loss")
plt.plot(Loss[1:-1])
plt.legend(["loss function"])
plt.show()
if self.solver == "NE": #正规方程法
self.coef_ = np.dot(np.dot(np.linalg.inv(np.dot(x.T, x)), x.T), y)
if self.penal == "L2": #岭回归
print("岭回归")
I = np.identity(np.shape(x)[1]) #单位阵
self.coef_ = np.dot(
np.dot(np.linalg.inv(np.dot(x.T, x) + np.dot(self.alpha, I)),
x.T), y)
if self.penal == "L1": #Lasso回归
print("Lasso回归")
m, n = X.shape
w = np.matrix(np.ones((n, 1)))
Lambda = self.alpha
for j in range(self.max_iter):
for i in range(len(w)):
p = x[:, i].reshape(1, -1).dot(
(y.reshape(-1, 1) -
x[:, i + 1::].dot(w[i + 1::, :]))) / m
z = np.dot(x[:, i].T, x[:, i]) / m
if p > Lambda / 2:
w_k = (p - Lambda / 2) / z
elif p < -Lambda / 2:
w_k = (p + Lambda / 2) / z
else:
w_k = 0
w[i] = w_k
self.coef_ = w
def predict(self, testX): #对新输入进行预测
if np.shape(testX)[1] < np.shape(x)[1] or np.shape(
testX)[1] > np.shape(x)[1]:
print("Error: The dimension of x is not equal to testX")
sys.exit()
else:
new_x = np.array(testX)
new_y = np.dot(testX, self.coef_)
return new_y
代码实践
- 调用sklearn数据集
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
- 创建线性回归对象, 拟合模型, 获取回归系数
LR = LinearRegression(solver='NE')
LR.fit(X,y)
LR.coef_
>>>
------线性回归代码测试------
Linear Regression: Setting(learning_rate=1e-06, max_iter=100, solver='NE', eps=1e-05, penal=None, alpha=None)
array([ 0.51351516, 0.01565111, -0.18252827, 0.86509906, 0.00000779,
-0.00469929, -0.06394582, -0.01638272])
- 创建"L2"回归对象, 拟合模型, 获取回归系数
LR = LinearRegression(penal="L2",alpha=100)
LR.fit(X,y)
LR.coef_
>>>
------线性回归代码测试------
Linear Regression: Setting(learning_rate=0.001, max_iter=100, solver='NE', eps=0.0001, penal=L2, alpha=100)
岭回归
array([ 0.50074406, 0.01575597, -0.15913453, 0.74360452, 0.0000081 ,
-0.00466495, -0.06717368, -0.0177807 ])
- 创建"L1"回归对象, 拟合模型, 获取回归系数
LR = LinearRegression(penal="L1",alpha=0.5)
LR.fit(X,y)
LR.coef_
>>>
------线性回归代码测试------
Linear Regression: Setting(learning_rate=0.001, max_iter=10, solver='NE', eps=0.0001, penal=L1, alpha=0.5)
Lasso回归
matrix([[ 0.05869157],
[ 0.00053924],
[ 0.00721206],
[ 0. ],
[-0.00000613],
[-0.000159 ],
[-0.00008648],
[-0.01728511]])
后续将加入LARS最小角回归来求解L1问题.