线性回归梯度下降法推导

返回目录

输入与输出满足线性关系,且输出为一系列连续的值。
假设函数:
h ( x ⃗ ) = θ ⃗ T x ⃗ h(\vec{x}) = \vec{\theta}^T\vec{x} h(x )=θ Tx
其中:
x ⃗ = [ x 0 , x 1 , . . . , x n ] T ∈ R ( n + 1 ) × 1 θ ⃗ = [ θ 0 , θ 1 , . . . , θ n ] T ∈ R ( n + 1 ) × 1 ( x 0 = 1 , n 为 特 征 个 数 ) \begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ (x_0=1,n为特征个数) \end{aligned} x =[x0,x1,...,xn]TR(n+1)×1θ =[θ0,θ1,...,θn]TR(n+1)×1x0=1,n
代价函数:
J ( θ ⃗ ) = 1 2 m ∑ i = 1 i = m ( h ( x ⃗ ( i ) ) − y ( i ) ) 2 J( \vec{\theta}) = \frac{1}{2m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2 J(θ )=2m1i=1i=m(h(x (i))y(i))2
其中:
y ⃗ = [ y ( 1 ) , y ( 2 ) , . . . , y ( m ) ] T ∈ R ( m × 1 ) ( m 为 测 试 样 本 个 数 ) \begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ (m为测试样本个数) \end{aligned} y =[y(1),y(2),...,y(m)]TR(m×1)m
梯度下降方向更新 θ \theta θ
θ j : = θ j − α ∂ J ( θ ⃗ ) ∂ θ j \theta_j := \theta_j-\alpha\frac{\partial J( \vec{\theta})}{\partial \theta_j} θj:=θjαθjJ(θ )
因为 J ( θ ) J(\theta) J(θ)为凸函数,存在极小值点。当 ∂ J ( θ ) / ∂ θ j > 0 {\partial J(\theta)} / {\partial \theta_j} > 0 J(θ)/θj>0时,此时 θ \theta θ在最优值右边,更新使 θ \theta θ值减小。当 ∂ J ( θ ) / ∂ θ j < 0 {\partial J(\theta)} / {\partial \theta_j} < 0 J(θ)/θj<0时,此时 θ \theta θ在最优值左边,更新使 θ \theta θ值增大。
θ j : = θ j − α 1 m ∑ i = 1 i = m ( θ ⃗ T x ⃗ ( i ) − y ( i ) ) x ⃗ j ( i ) = θ j − α m ( θ ⃗ T x ⃗ ( 1 ) − y ( 1 ) θ ⃗ T x ⃗ ( 2 ) − y ( 2 ) . . . θ ⃗ T x ⃗ ( m ) − y ( m ) ) ( x ⃗ j ( 1 ) x ⃗ j ( 2 ) . . . x ⃗ j ( m ) ) = θ j − α m ( θ ⃗ T x ⃗ ( 1 ) − y ( 1 ) θ ⃗ T x ⃗ ( 2 ) − y ( 2 ) . . . θ ⃗ T x ⃗ ( m ) − y ( m ) ) T ( x ⃗ j ( 1 ) x ⃗ j ( 2 ) . . . x ⃗ j ( m ) ) \begin{aligned} \theta_j :&= \theta_j-\alpha\frac{1}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})\vec{x}_j^{(i)} \\ &= \theta_j-\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} & \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} & ...& \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix} \begin{pmatrix} \vec{x}_j^{(1)} \\\vec{x}_j^{(2)} \\... \\\vec{x}_j^{(m)} \end{pmatrix} \\ &= \theta_j-\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} \\ \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} \\ ...\\ \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix}^T \begin{pmatrix} \vec{x}_j^{(1)} \\\vec{x}_j^{(2)} \\... \\\vec{x}_j^{(m)} \end{pmatrix} \\ \end{aligned} θj:=θjαm1i=1i=m(θ Tx (i)y(i))x j(i)=θjmα(θ Tx (1)y(1)θ Tx (2)y(2)...θ Tx (m)y(m))x j(1)x j(2)...x j(m)=θjmαθ Tx (1)y(1)θ Tx (2)y(2)...θ Tx (m)y(m)Tx j(1)x j(2)...x j(m)
所以:
( θ 0 θ 1 . . . θ n ) T : = ( θ 0 θ 1 . . . θ n ) T − α m ( θ ⃗ T x ⃗ ( 1 ) − y ( 1 ) θ ⃗ T x ⃗ ( 2 ) − y ( 2 ) . . . θ ⃗ T x ⃗ ( m ) − y ( m ) ) T ( x ⃗ 0 ( 1 ) x ⃗ 1 ( 1 ) . . . x ⃗ n ( 1 ) x ⃗ 0 ( 2 ) x ⃗ 1 ( 2 ) . . . x ⃗ n ( 2 ) . . . . . . . . . . . . x ⃗ 0 ( m ) x ⃗ 1 ( m ) . . . x ⃗ n ( m ) ) : = ( θ 0 θ 1 . . . θ n ) T − α m ( ( x ⃗ ( 1 ) ) T θ ⃗ − y ( 1 ) ( x ⃗ ( 2 ) ) T θ ⃗ − y ( 2 ) . . . ( x ⃗ ( m ) ) T θ ⃗ − y ( m ) ) T ( x ( 0 , 0 ) x ( 0 , 1 ) . . . x ( 0 , n ) x ( 1 , 0 ) x ( 1 , 1 ) . . . x ( 1 , n ) . . . . . . . . . . . . x ( m − 1 , 0 ) x ( m − 1 , 1 ) . . . x ( m − 1 , n ) ) \begin{aligned} \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T &:= \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T -\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} \\ \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} \\ ...\\ \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix} ^T \begin{pmatrix} \vec{x}_0^{(1)} & \vec{x}_1^{(1)} & ...&\vec{x}_n^{(1)}\\ \vec{x}_0^{(2)} & \vec{x}_1^{(2)} & ...&\vec{x}_n^{(2)}\\ ... & ...&...&...\\ \vec{x}_0^{(m)} & \vec{x}_1^{(m)} & ...&\vec{x}_n^{(m)} \end{pmatrix} \\ &:= \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T -\frac{\alpha}{m} \begin{pmatrix} (\vec{x}^{(1)})^T \vec{\theta}-y^{(1)} \\ (\vec{x}^{(2)})^T \vec{\theta}-y^{(2)} \\ ...\\(\vec{x}^{(m)})^T \vec{\theta}-y^{(m)} \end{pmatrix} ^T \begin{pmatrix} x_{(0, 0)} & x_{(0, 1)} & ...&x_{(0, n)}\\ x_{(1, 0)} & x_{(1, 1)}& ...&x_{(1, n)}\\ ... & ...&...&...\\ x_{(m-1, 0)} & x_{(m-1,1)} & ...&x_{(m-1, n)} \end{pmatrix} \end{aligned} θ0θ1...θnT:=θ0θ1...θnTmαθ Tx (1)y(1)θ Tx (2)y(2)...θ Tx (m)y(m)Tx 0(1)x 0(2)...x 0(m)x 1(1)x 1(2)...x 1(m)............x n(1)x n(2)...x n(m):=θ0θ1...θnTmα(x (1))Tθ y(1)(x (2))Tθ y(2)...(x (m))Tθ y(m)Tx(0,0)x(1,0)...x(m1,0)x(0,1)x(1,1)...x(m1,1)............x(0,n)x(1,n)...x(m1,n)
得到 θ ⃗ \vec{\theta} θ 更新的向量表达形式:
θ ⃗ T : = θ ⃗ T − α m ( X θ ⃗ − y ⃗ ) T X θ ⃗ : = θ ⃗ − α m X T ( X θ ⃗ − y ⃗ ) \begin{aligned} \vec{\theta}^T &:= \vec{\theta}^T-\frac{\alpha}{m}({X} \vec{\theta}-\vec{y})^T{X} \\ \vec{\theta} &:= \vec{\theta}-\frac{\alpha}{m}{X}^T({X} \vec{\theta}-\vec{y}) \end{aligned} θ Tθ :=θ Tmα(Xθ y )TX:=θ mαXT(Xθ y )
代码实现如下:

import numpy as np
import matplotlib.pyplot as plt

def linear_regression(x_in, y_in, alpha=0.01, epsilon=1e-5):
    sample_num = y_in.shape[0]
    return gradient_descent(sample_num, x_in, y_in, cost_function_lr, alpha, epsilon)
    
def cost_function_lr(sample_num, theta, x_in, y_in):
    diff = (x_in*theta)-y_in
    j_theta = diff.T*diff/sample_num
    partial_theta = (x_in.T*diff)/sample_num
    return (j_theta, partial_theta)
    
def gradient_descent(sample_num, x_in, y_in, cost_function, alpha, epsilon):
    theta = np.mat(np.zeros((x_in.shape[1],1)))
    pre_theta = 0xFFFFFFFF;
    count = 5000
    while count:
        (j_theta, partial_theta) = cost_function(sample_num, theta, x_in, y_in)
        if j_theta < epsilon or np.fabs(partial_theta).all() < epsilon:
            break
        theta -= alpha*partial_theta
        if j_theta > pre_theta:
            alpha /= 10
        pre_theta = j_theta;
        count -= 1
    if not count:
        print('get max count')
    return theta
    
if __name__ == '__main__':
    m = 30
    x0 = np.ones((m, 1))
    x1 = np.arange(1, m+1).reshape(m, 1)
    x_in = np.mat(np.hstack((x0, x1)))
    theta = np.mat([5, 0.5]).reshape(2, 1)
    y_in = x_in*theta + np.random.randn(m).reshape(m, 1)

    res_theta_gd = linear_regression(x_in, y_in, 0.001, 1e-5)
    
    plt.scatter(x1, np.array(y_in))
    plt.plot(x1, np.array(x_in*res_theta_gd), color='r')
    diff = x_in*res_theta_gd-y_in
    plt.title('cost: %f' % ((diff.T*diff)[0][0] / (2*m)))
    plt.show()

结果分析:
线性回归梯度下降法
归一化:
对于 x ⃗ \vec{x} x 的各个特征取值范围进行归一化处理,提高收敛速度。
x j = x j − m i n ( x j ( i ) ) m a x ( x j ( i ) ) − m i n ( x j ( i ) ) x_j = \frac{x_j-min(x_j^{(i)})}{max(x_j^{(i)})-min(x_j^{(i)})} xj=max(xj(i))min(xj(i))xjmin(xj(i))
标准化:
x j = x j − u σ x_j = \frac{x_j-u}{\sigma} xj=σxju
其中:
i = 1 , 2 , . . , m ( m 为 样 本 个 数 ) j = 0 , 1 , 2 , . . , n ( n 为 特 征 个 数 ) u 为 均 值 σ 为 标 准 差 \begin{aligned} &i=1, 2, .., m&(m为样本个数)\\ &j =0, 1,2,..,n&(n为特征个数)\\ &u为均值\\ &\sigma为标准差 \end{aligned} i=1,2,..,mj=0,1,2,..,nuσ(m)(n)

返回目录

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值