Gradient Descent Vectorization

  • 我们假定 X X X为数据集,其中每一行 x ( i ) x^{(i)} x(i)为一个样本,列数代表其特征数量;
  • Y Y Y为其真实值,每行 y ( i ) y^{(i)} y(i)与每个输入样本对应;
  • Θ \Theta Θ为每个特征的权重;
    X = [ x 0 ( 0 ) x 1 ( 0 ) ⋯ x n ( 0 ) x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) ⋮ ⋮ ⋱ ⋮ x 0 ( m ) x 0 ( m ) ⋯ x 0 ( m ) ] = [ x ( 0 ) x ( 1 ) ⋮ x ( m ) ] X = \left[ \begin{matrix} x_0^{(0)} & x_1^{(0)} & \cdots & x_n^{(0)} \\ x_0^{(1)} & x_1^{(1)} & \cdots & x_n^{(1)} \\ \vdots & \vdots & \ddots & \vdots\\ x_0^{(m)} & x_0^{(m)} & \cdots & x_0^{(m)} \end{matrix} \right] = \left[ \begin{matrix} x^{(0)} \\ x^{(1)} \\ \vdots \\ x^{(m)} \end{matrix} \right] X=x0(0)x0(1)x0(m)x1(0)x1(1)x0(m)xn(0)xn(1)x0(m)=x(0)x(1)x(m)

Y = [ y ( 0 ) y ( 1 ) ⋮ y ( m ) ] Y = \left[ \begin{matrix} y^{(0)} \\ y^{(1)} \\ \vdots\\ y^{(m)} \\ \end{matrix} \right] Y=y(0)y(1)y(m)

Θ = [ θ 0 θ 1 ⋮ θ n ] \Theta = \left[ \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \\ \end{matrix} \right] Θ=θ0θ1θn

神经网络结构
如图所示为一个简单的神经网络,仅包含一个输入层和一个输出层,没有任何的隐藏层;对于其中的某一个样本 x ( i ) x^{(i)} x(i)来说,其输出的预测值 y ^ ( i ) \hat y^{(i)} y^(i)为:
(1) y ^ ( i ) = θ 0 x 0 ( i ) + θ 1 x 1 ( i ) + . . . + θ n x n ( i ) = x ( i ) θ \hat y^{(i)} = \theta_0x_0^{(i)} + \theta_1x_1^{(i)} + ... + \theta_nx_n^{(i)} = x^{(i)}\theta \tag{1} y^(i)=θ0x0(i)+θ1x1(i)+...+θnxn(i)=x(i)θ(1)
则所有样本的预测值 Y ^ \hat Y Y^为:
Y ^ = [ y ^ ( 0 ) y ^ ( 1 ) ⋮ y ^ ( m ) ] \hat Y = \left[ \begin{matrix} \hat y^{(0)} \\ \hat y^{(1)} \\ \vdots \\ \hat y^{(m)} \end{matrix} \right] Y^=y^(0)y^(1)y^(m)
我们以预测值与真实值的平方误差为损失函数 L L L
(2) L = 1 m ∑ i = 1 m 1 2 ( y ^ ( i ) − y ( i ) ) 2 = 1 2 m ∑ i = 1 m ( x ( i ) θ − y ( i ) ) 2 L = \frac{1}{m}\sum_{i=1}^m\frac{1}{2}(\hat y^{(i)} - y^{(i)})^2 \\ = \frac{1}{2m}\sum_{i=1}^m( x^{(i)}\theta - y^{(i)})^2 \tag{2} L=m1i=1m21(y^(i)y(i))2=2m1i=1m(x(i)θy(i))2(2)
假设我们现在要计算 θ j \theta_j θj经梯度下降后的更新值,其中 α \alpha α为学习率:
(3) θ j = θ j − α ∂ L ∂ θ j \theta_j = \theta_j - \alpha\frac{\partial L}{\partial \theta_j} \tag{3} θj=θjαθjL(3)
我们对损失函数,即公式(2)求 θ j \theta_j θj的微分:
(4) ∂ L ∂ θ j = 1 m ∑ i = 1 m ( x ( i ) θ − y ( i ) ) ∂ ( x ( i ) θ ) ∂ θ j = 1 m ∑ i = 1 m ( x ( i ) θ − y ( i ) ) x j ( i ) \frac{\partial L}{\partial \theta_j} = \frac{1}{m}\sum_{i=1}^m( x^{(i)}\theta - y^{(i)})\frac{\partial( x^{(i)}\theta)}{\partial \theta_j} \\ =\frac{1}{m}\sum_{i=1}^m( x^{(i)}\theta - y^{(i)})x_j^{(i)} \tag{4} θjL=m1i=1m(x(i)θy(i))θj(x(i)θ)=m1i=1m(x(i)θy(i))xj(i)(4)
我们记 e ( i ) = x ( i ) θ − y ( i ) e^{(i)} = x^{(i)}\theta -y^{(i)} e(i)=x(i)θy(i),用 E E E表示所有的 e ( i ) e^{(i)} e(i)有:
E = [ e ( 0 ) e ( 1 ) ⋮ e ( m ) ] = [ x ( 0 ) θ − y ( 0 ) x ( 1 ) θ − y ( 1 ) ⋮ x ( m ) θ − y ( m ) ] = X Θ − Y E=\left[ \begin{matrix} e^{(0)} \\ e^{(1)} \\ \vdots \\ e^{(m)} \end{matrix} \right]= \left[ \begin{matrix} x^{(0)}\theta -y^{(0)} \\ x^{(1)}\theta -y^{(1)} \\ \vdots \\ x^{(m)}\theta -y^{(m)} \end{matrix} \right] = X\Theta-Y E=e(0)e(1)e(m)=x(0)θy(0)x(1)θy(1)x(m)θy(m)=XΘY

则公式(4)可表示为:
(5) ∂ L ∂ θ j = 1 m ∑ i = 1 m e ( i ) x j ( i ) = 1 m ( x j ( 0 ) , x j ( 1 ) , . . . , x j ( m ) ) E \frac{\partial L}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^me^{(i)}x_j^{(i)} \\ =\frac{1}{m}(x_j^{(0)},x_j^{(1)},...,x_j^{(m)})E \tag{5} θjL=m1i=1me(i)xj(i)=m1(xj(0),xj(1),...,xj(m))E(5)
将公式(5)代入公式(3)中可以得到:
(6) θ j = θ j − α 1 m ( x j ( 0 ) , x j ( 1 ) , . . . , x j ( m ) ) E \theta_j = \theta_j - \alpha\frac{1}{m}(x_j^{(0)},x_j^{(1)},...,x_j^{(m)})E \tag{6} θj=θjαm1(xj(0),xj(1),...,xj(m))E(6)
因此,我们可以得到所有权重的梯度更新为:
Θ = [ θ 0 θ 1 ⋮ θ n ] = [ θ 0 θ 1 ⋮ θ n ] − α m [ x 0 ( 0 ) x 0 ( 1 ) ⋯ x 0 ( m ) x 1 ( 0 ) x 1 ( 1 ) ⋯ x 1 ( m ) ⋮ ⋮ ⋱ ⋮ x n ( 0 ) x n ( 1 ) ⋯ x n ( m ) ] E = Θ − α m X T E = Θ − α m X T ( X Θ − Y ) \Theta = \left[ \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \\ \end{matrix} \right] = \left[ \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \\ \end{matrix} \right] - \frac{\alpha}{m} \left[ \begin{matrix} x_0^{(0)} & x_0^{(1)} & \cdots & x_0^{(m)} \\ x_1^{(0)} & x_1^{(1)} & \cdots & x_1^{(m)} \\ \vdots & \vdots & \ddots & \vdots\\ x_n^{(0)} & x_n^{(1)} & \cdots & x_n^{(m)} \end{matrix} \right]E \\ = \Theta- \frac{\alpha}{m}X^TE=\Theta- \frac{\alpha}{m}X^T(X\Theta-Y) Θ=θ0θ1θn=θ0θ1θnmαx0(0)x1(0)xn(0)x0(1)x1(1)xn(1)x0(m)x1(m)xn(m)E=ΘmαXTE=ΘmαXT(XΘY)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值