线性回归模型---最小二乘法、梯度下降法

线性回归模型

   假设样本 T = { ( x ( 1 ) , y ( 1 ) ) , ⋅ ⋅ ⋅ , ( x ( m ) , y ( m ) ) } T=\{(x^{(1)},y^{(1)}),\cdot \cdot \cdot,(x^{(m)},y^{(m)})\} T={(x(1),y(1)),(x(m),y(m))},其中 x ( i ) = ( x 1 i , x 2 i , ⋅ ⋅ ⋅ , x n i ) T x^{(i)}=(x^{i}_{1},x^{i}_{2},\cdot \cdot \cdot,x^{i}_{n})^{T} x(i)=(x1i,x2i,xni)T

模型
y ^ = h ( x ) = ω 0 + ω 1 + x 1 + ⋅ ⋅ ⋅ + ω n x n = ω T x = x T ω \begin{aligned} \widehat{y}&=h(x)=\omega_{0}+\omega_{1}+x_{1}+\cdot \cdot \cdot+\omega_{n}x_{n} \\ &=\omega^{T}x =x^{T}\omega \end{aligned} y =h(x)=ω0+ω1+x1++ωnxn=ωTx=xTω
   其中
ω = [ ω 0 ω 1 ⋅ ⋅ ⋅ ω n ] , x = [ 1 x 1 ⋅ ⋅ ⋅ x n ] \omega= \begin{bmatrix} \omega_{0}\\ \omega_{1}\\ \cdot \cdot \cdot\\ \omega_{n}\\ \end{bmatrix} , x= \begin{bmatrix} 1\\ x_{1}\\ \cdot \cdot \cdot\\ x_{n}\\ \end{bmatrix} ω= ω0ω1ωn ,x= 1x1xn
损失函数
l ( x i ) = 1 2 ( y ^ ( i ) − y ( i ) ) 2 l(x^{i})=\frac{1}{2}(\widehat{y}^{(i)}-y^{(i)})^{2} l(xi)=21(y (i)y(i))2
   这是单个点与预测点之间的差距,若将所有差距加起来则构成了最后的目标函数,即求解所有差距的最小值。
目标函数
J ( ω ) = 1 2 ∑ i = 1 m ( y ^ ( i ) − y ( i ) ) 2 J(\omega)=\frac{1}{2}\sum^{m}_{i=1}{(\widehat{y}^{(i)}-y^{(i)})^{2}} J(ω)=21i=1m(y (i)y(i))2

可以看到求解该模型其实也就是求解出最后的 ω \omega ω,以下是两种求解方法

最小二乘法

这里采用矩阵形式进行求解,可以将上述模型写成如下:
Y ^ = [ y ^ ( 1 ) y ^ ( 2 ) ⋅ ⋅ ⋅ y ^ ( m ) ] = [ 1 x 1 ( 1 ) x 2 ( 1 ) ⋅ ⋅ ⋅ x n ( 1 ) 1 x 1 ( 2 ) x 2 ( 2 ) ⋅ ⋅ ⋅ x n ( w ) ⋅ ⋅ ⋅ 1 x 1 ( m ) x 1 ( m ) ⋅ ⋅ ⋅ x n ( m ) ] [ ω 0 ω 1 ⋅ ⋅ ⋅ ω n ] = X ω \widehat{Y}= \begin{bmatrix} \widehat{y}^{(1)}\\ \widehat{y}^{(2)}\\ \cdot \cdot \cdot\\ \widehat{y}^{(m)}\\ \end{bmatrix} =\begin{bmatrix} 1 & x^{(1)}_{1} & x^{(1)}_{2} &\cdot \cdot \cdot&x^{(1)}_{n} \\ 1 & x^{(2)}_{1}& x^{(2)}_{2}&\cdot \cdot \cdot&x^{(w)}_{n}\\ &&\cdot \cdot \cdot\\ 1 & x^{(m)}_{1}& x^{(m)}_{1}&\cdot \cdot \cdot&x^{(m)}_{n} \end{bmatrix} \begin{bmatrix} \omega_{0}\\ \omega_{1}\\ \cdot \cdot \cdot\\ \omega_{n}\\ \end{bmatrix}=X\omega Y = y (1)y (2)y (m) = 111x1(1)x1(2)x1(m)x2(1)x2(2)x1(m)xn(1)xn(w)xn(m) ω0ω1ωn =Xω

模型
J ( ω ) = 1 2 ∣ ∣ X ω − Y ∣ ∣ 2 = 1 2 ( X ω − Y ) T ( X ω − Y ) = 1 2 ( ω T X T − Y ) ( X ω − Y ) = 1 2 ( ω T X T X ω − ω T X T Y − Y T X ω + Y T Y ) = 1 2 ( ω T X T X ω − 2 ω T X T Y + Y T Y ) \begin{aligned} J(\omega)&=\frac{1}{2}||X\omega-Y||^{2}\\ &=\frac{1}{2}(X\omega-Y)^{T}(X\omega-Y)\\ &=\frac{1}{2}(\omega^{T}X^{T}-Y)(X\omega-Y)\\ &=\frac{1}{2}(\omega^{T}X^{T}X\omega-\omega^{T}X^{T}Y-Y^{T}X\omega+Y^{T}Y)\\ &=\frac{1}{2}(\omega^{T}X^{T}X\omega-2\omega^{T}X^{T}Y+Y^{T}Y) \end{aligned} J(ω)=21∣∣XωY2=21(XωY)T(XωY)=21(ωTXTY)(XωY)=21(ωTXTXωωTXTYYTXω+YTY)=21(ωTXTXω2ωTXTY+YTY)

   这里的矩阵求导会用到下面几个公式:
d X T X d X = 2 x , d A X d X = A T , d X T A d X = A , ∂ X T A X ∂ X = 2 A X \frac{dX^{T}X}{dX}=2x,\frac{dAX}{dX}=A^{T},\frac{dX^{T}A}{dX}=A,\frac{\partial X^{T}AX}{\partial X}=2AX dXdXTX=2x,dXdAX=AT,dXdXTA=A,XXTAX=2AX

   对目标函数进行求导
∂ J ( ω ) ∂ ω = 1 2 ( 2 X T X ω − 2 X T Y ) = X T X ω − X T Y \frac{\partial J(\omega)}{\partial \omega}=\frac{1}{2}(2X^{T}X\omega-2X^{T}Y)=X^{T}X\omega-X^{T}Y ωJ(ω)=21(2XTXω2XTY)=XTXωXTY

   令 ∂ J ( ω ) ∂ ω = 0 \frac{\partial J(\omega)}{\partial \omega}=0 ωJ(ω)=0,即可求出 ω = ( X T X ) − 1 X T Y \omega=(X^{T}X)^{-1}X^{T}Y ω=(XTX)1XTY,最后将 ω \omega ω代入 y ^ = h x \widehat{y}=h{x} y =hx即求出。

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

df=pd.read_csv('house_price.csv')
df['price']=df['price'][:7].astype(int)
df=df.values
x=df[:7,1]
y=df[:7,2]
m=y.size

'''最小二乘法'''
#数据可视化
plt.scatter(x, y)

# x矩阵第一列为1
a=np.ones((m,1))
x=np.column_stack((a,x)) 

#计算权重w
w=np.linalg.inv(x.T@x)@x.T@y

#再次可视化,拟合后的曲线
i=0
b = np.ones(7)
while i<7:
    b[i] = x[i]@w.T
    i = i+1
plt.plot(x[:,1], b)

# 预测
x_predict=700
y_predict=np.array([1,x_predict])@w.T
print(y_predict)

梯度下降法

思想:迭代法求解优化模型
格式 X k + 1 = X k + α d k X_{k+1}=X_{k}+\alpha d_{k} Xk+1=Xk+αdk α \alpha α表示步长, d k d_{k} dk表示梯度方向。
   梯度方向是函数值增加最快的方向。

模型:对代价函数求导可得 d k ( ω j ) = ∂ J ( ω ) ∂ ω = ∑ i = 1 m ( h ( x i ) − y ( i ) ) ⋅ x j ( i ) d_{k}(\omega_{j})=\frac{\partial J(\omega)}{\partial \omega}=\sum^{m}_{i=1}(h(x^{i})-y^{(i)})\cdot x^{(i)}_{j} dk(ωj)=ωJ(ω)=i=1m(h(xi)y(i))xj(i)

单点梯度下降算法

   Step1: 初始化 α , ε , ω j ( j = 0 , ⋅ ⋅ ⋅ , n ) , k , M a x N \alpha,\varepsilon,\omega_{j}(j=0,\cdot \cdot \cdot,n),k,MaxN α,ε,ωj(j=0,,n),k,MaxN
   Step2: 任选样本 ( x ( i ) , y ( i ) ) (x^{(i)},y^{(i)}) (x(i),y(i)),计算 d k ( ω j ) = ( h ( x i ) − y ( i ) ) ⋅ x j ( i ) d_{k}(\omega_{j})=(h(x^{i})-y^{(i)})\cdot x^{(i)}_{j} dk(ωj)=(h(xi)y(i))xj(i)
   Step3: 迭代 ω j : = ω j − α d k ( ω j ) \omega_{j}:=\omega_{j}-\alpha d_{k}(\omega_{j}) ωj:=ωjαdk(ωj)
           k = k + 1 k=k+1 k=k+1
   Step4: ∣ ∣ d k ∣ ∣ < ε ||d_{k}||<\varepsilon ∣∣dk∣∣<ε k > M a x N k>MaxN k>MaxN,输出 ω j \omega_{j} ωj,否则转回Step2

注意
g k = [ g k ( ω 0 ) g k ( ω 1 ) ⋅ ⋅ ⋅ g k ( ω n ) ] g_{k}= \begin{bmatrix} g_{k}(\omega_{0})\\ g_{k}(\omega_{1})\\ \cdot \cdot \cdot\\ g_{k}(\omega_{n})\\ \end{bmatrix} gk= gk(ω0)gk(ω1)gk(ωn)

矩阵格式的梯度下降算法(整体梯度下降)

   由前面的最小二乘法,已经推得 ∂ J ( ω ) ∂ ω = X T ( X ω − Y ) \frac{\partial J(\omega)}{\partial \omega}=X^{T}(X\omega-Y) ωJ(ω)=XT(XωY)
   Step1: 初始化 α , ε , ω k = 1 , k , M a x N \alpha,\varepsilon,\omega_{k}=1,k,MaxN α,ε,ωk=1,k,MaxN
   Step2: 计算 d k ( ω j ) = X T ( X ω − Y ) d_{k}(\omega_{j})=X^{T}(X\omega-Y) dk(ωj)=XT(XωY)
   Step3: 迭代 ω k + 1 : = ω k − α d k ( ω k ) \omega_{k+1}:=\omega_{k}-\alpha d_{k}(\omega_{k}) ωk+1:=ωkαdk(ωk)
           k = k + 1 k=k+1 k=k+1
   Step4: ∣ ∣ d k ∣ ∣ < ε ||d_{k}||<\varepsilon ∣∣dk∣∣<ε k > M a x N k>MaxN k>MaxN,输出 ω j \omega_{j} ωj,否则转回Step2

import numpy as np
import pandas as pd

df=pd.read_csv('house_price.csv')
df.drop(['No'],axis=1,inplace=True)
df['price']=df['price'][:7].astype(int)
df=df.values
x=df[:7,0]
y=df[:7,1].reshape(-1,1)
m=y.size

'''标准化'''
def norm(x):
    sigma = np.std(x, axis=0) 
    mu = np.mean(x, axis=0)
    x = (x-mu)/sigma
    return x, mu, sigma

x, mu, sigma = norm(x) 
x = np.c_[np.ones(m), x]

'''梯度下降法'''
alpha=0.1
w=np.random.randint(1,1000,(2,1))
error=10**(-5)
k=0
def gradient_descent(x,y,w,alpha,error,k):
    for i in range(300):
        g=x.T@(x@w-y)
        w=w-alpha*g
        k+=1
        if np.linalg.norm(g)<error:
            break
    print('迭代了{}次,w为{}'.format(k, w))
    return w

w = gradient_descent(x,y,w,alpha,error,k)

#预测
x_predict=700
x_predict = (x_predict-mu)/sigma
y_predict=np.array([1,x_predict])@w
print('预测为{}'.format(y_predict))

最小二乘法与梯度下降法比较

   梯度下降法: 需要选择学习率 α \alpha α,需要多次迭代。当特征数量 n n n大时也能较好适用,适用于各种类型的模型。
   最小二乘法: 若特征数量 n n n较大则运算代价大,因为矩阵逆的计算时间复杂度为 O ( n 3 ) O(n^{3}) O(n3),通常当 n < 100000 n<100000 n<100000时还可以接受,只适用于线性模型,不适合logistic等其他模型。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值