内容
1. 线性回归原理
线性回归形式:
f
(
x
)
=
θ
0
+
θ
1
x
1
+
θ
2
x
2
+
.
.
.
+
θ
d
x
d
=
∑
i
=
0
d
θ
i
x
i
\begin{aligned} f(x) &= \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_dx_d \\ &= \sum_{i=0}^{d}\theta_ix_i \\ \end{aligned}
f(x)=θ0+θ1x1+θ2x2+...+θdxd=i=0∑dθixi
如何来确定𝜃 的值,使得𝑓(𝑥) 尽可能接近y的值呢?
均方误差是回归中常用的性能度量,即:
J
(
θ
)
=
1
n
∑
j
=
1
n
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\theta)=\frac{1}{n}\sum_{j=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})^2
J(θ)=n1j=1∑n(hθ(x(i))−y(i))2
可以根据极大似然估计来求解。
2. 线性回归损失函数、代价函数、目标函数
- 损失函数(Loss Function):度量单样本预测的错误程度,损失函数值越小,模型就越好。
- 代价函数(Cost Function):度量全部样本集的平均误差。
- 目标函数(Object Function):代价函数和正则化函数,最终要优化的函数。
常用的损失函数包括:0-1损失函数、平方损失函数、绝对损失函数、对数损失函数等;常用的代价函数包括均方误差、均方根误差、平均绝对误差等。
目标函数可以看做损失函数和模型复杂程度的结合,力求在两者之间找一个平衡,即:
m
i
n
f
∈
F
1
n
∑
i
=
1
n
L
(
y
i
,
f
(
x
i
)
)
+
λ
J
(
F
)
\underset{f\in F}{min}\, \frac{1}{n}\sum^{n}_{i=1}L(y_i,f(x_i))+\lambda J(F)
f∈Fminn1i=1∑nL(yi,f(xi))+λJ(F)
3. 线性回归的优化方法
(1)梯度下降法
θ
j
:
=
θ
j
−
α
∂
J
(
θ
)
∂
θ
\theta_j:=\theta_j-\alpha\frac{\partial{J(\theta)}}{\partial\theta}
θj:=θj−α∂θ∂J(θ)
∂
J
(
θ
)
∂
θ
=
∂
∂
θ
j
1
2
∑
i
=
1
n
(
f
θ
(
x
)
(
i
)
−
y
(
i
)
)
2
=
2
∗
1
2
∑
i
=
1
n
(
f
θ
(
x
)
(
i
)
−
y
(
i
)
)
∗
∂
∂
θ
j
(
f
θ
(
x
)
(
i
)
−
y
(
i
)
)
=
∑
i
=
1
n
(
f
θ
(
x
)
(
i
)
−
y
(
i
)
)
∗
∂
∂
θ
j
(
∑
j
=
0
d
θ
j
x
j
(
i
)
−
y
(
i
)
)
)
=
∑
i
=
1
n
(
f
θ
(
x
)
(
i
)
−
y
(
i
)
)
x
j
(
i
)
\begin{aligned} \frac{\partial{J(\theta)}}{\partial\theta} &= \frac{\partial}{\partial\theta_j}\frac{1}{2}\sum_{i=1}^{n}(f_\theta(x)^{(i)}-y^{(i)})^2 \\ &= 2*\frac{1}{2}\sum_{i=1}^{n}(f_\theta(x)^{(i)}-y^{(i)})*\frac{\partial}{\partial\theta_j}(f_\theta(x)^{(i)}-y^{(i)}) \\ &= \sum_{i=1}^{n}(f_\theta(x)^{(i)}-y^{(i)})*\frac{\partial}{\partial\theta_j}(\sum_{j=0}^{d}\theta_jx_j^{(i)}-y^{(i)}))\\ &= \sum_{i=1}^{n}(f_\theta(x)^{(i)}-y^{(i)})x_j^{(i)} \\ \end{aligned}
∂θ∂J(θ)=∂θj∂21i=1∑n(fθ(x)(i)−y(i))2=2∗21i=1∑n(fθ(x)(i)−y(i))∗∂θj∂(fθ(x)(i)−y(i))=i=1∑n(fθ(x)(i)−y(i))∗∂θj∂(j=0∑dθjxj(i)−y(i)))=i=1∑n(fθ(x)(i)−y(i))xj(i)
θ
j
=
θ
j
+
α
∑
i
=
1
n
(
y
(
i
)
−
f
θ
(
x
)
(
i
)
)
x
j
(
i
)
\theta_j = \theta_j + \alpha\sum_{i=1}^{n}(y^{(i)}-f_\theta(x)^{(i)})x_j^{(i)}
θj=θj+αi=1∑n(y(i)−fθ(x)(i))xj(i)
批梯度法:
θ
=
θ
+
α
∑
i
=
1
n
(
y
(
i
)
−
f
θ
(
x
)
(
i
)
)
x
(
i
)
\theta = \theta + \alpha\sum_{i=1}^{n}(y^{(i)}-f_\theta(x)^{(i)})x^{(i)}
θ=θ+αi=1∑n(y(i)−fθ(x)(i))x(i)
随机梯度法
θ
=
θ
+
α
(
y
(
i
)
−
f
θ
(
x
)
(
i
)
)
x
(
i
)
\theta = \theta + \alpha(y^{(i)}-f_\theta(x)^{(i)})x^{(i)}
θ=θ+α(y(i)−fθ(x)(i))x(i)
(2)最小二乘法矩阵求解
θ = ( X T X ) ( − 1 ) X T Y \theta = (X^TX)^{(-1)}X^TY θ=(XTX)(−1)XTY
(3)牛顿法
f ( θ ) ′ = f ( θ ) Δ , Δ = θ 0 − θ 1 f(\theta)' = \frac{f(\theta)}{\Delta},\Delta = \theta_0 - \theta_1 f(θ)′=Δf(θ),Δ=θ0−θ1
可 求 得 , θ 1 = θ 0 − f ( θ 0 ) f ( θ 0 ) ′ 可求得,\theta_1 = \theta_0 - \frac {f(\theta_0)}{f(\theta_0)'} 可求得,θ1=θ0−f(θ0)′f(θ0)
(4)拟牛顿法
4. 代码实现
生成数据
#生成数据
import numpy as np
import pandas as pd
np.random.seed(1)
x=np.random.rand(500, 3)
#映射的函数为 y=x0+1.2x1+3.1x2+4x2
y=x.dot(np.array([1.2, 3.1, 4]))
估计的参数值为:[1.2 3.1 4. ]
截距:8.881784197001252e-16
R2:1.0
预测值为: [34.8]
sklearn训练模型
#sklearn训练模型
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
%matplotlib inline
# 调用模型
lr = LinearRegression(fit_intercept=True)
# 训练模型
lr.fit(x,y)
print("估计的参数值为:%s" %(lr.coef_))
print("截距:%s" %(lr.intercept_))
# 计算R平方
print('R2:%s' %(lr.score(x,y)))
# 任意设定变量,预测目标值
x_test = np.array([2,4,5]).reshape(1,-1)
y_hat = lr.predict(x_test)
print("预测值为: %s" %(y_hat))
估计的参数值:[1.2 3.1 4. ]
预测值为: [34.8]
最小二乘法求解
class LR_LS():
def __init__(self):
self.w = None
def fit(self, X, y):
# 最小二乘法矩阵求解
self.w = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
def predict(self, X):
# 用已经拟合的参数值预测新自变量
y_pred = X.dot(self.w)
if __name__ == "__main__":
lr_ls = LR_LS()
lr_ls.fit(x,y)
print("估计的参数值:%s" %(lr_ls.w))
x_test = np.array([2,4,5]).reshape(1,-1)
print("预测值为: %s" %(lr_ls.predict(x_test)))
梯度下降法求解
class LR_GD():
def __init__(self):
self.w = None
def fit(self,X,y,alpha=0.02,loss = 1e-10): # 设定步长为0.002,判断是否收敛的条件为1e-10
y = y.reshape(-1,1) #重塑y值的维度以便矩阵运算
[m,d] = np.shape(X) #自变量的维度
self.w = np.zeros((d)) #将参数的初始值定为0
tol = 1e5
while tol > loss:
h_f = X.dot(self.w).reshape(-1,1)
theta = self.w + alpha*np.mean(X*(y - h_f),axis=0) #计算迭代的参数值
tol = np.sum(np.abs(theta - self.w))
self.w = theta
def predict(self, X):
# 用已经拟合的参数值预测新自变量
y_pred = X.dot(self.w)
return y_pred
if __name__ == "__main__":
lr_gd = LR_GD()
lr_gd.fit(x,y)
print("估计的参数值为:%s" %(lr_gd.w))
x_test = np.array([2,4,5]).reshape(1,-1)
print("预测值为:%s" %(lr_gd.predict(x_test)))
估计的参数值为:[1.20000002 3.10000001 3.99999997]
预测值为:[34.79999993]