深度学习Deep learning小白入门笔记——20230707

Deep Learning 2023/07/07

1. Function with Unknown Parameters

L ( b , w ) = y = b + w x L(b,w) = y = b + wx L(b,w)=y=b+wx

2. Define Loss from Training Data

  • Loss is a function of parameters.

L ( b , w ) L(b,w) L(b,w)

  • Loss: how good a set of value is.

  • Loss Function.

L o s s : L = 1 N ∑ n e n Loss: L = \frac{1}{N}\sum_{n}e_n Loss:L=N1nen

e为误差值

  • L is mean absolute error(MAE).

e = ∣ y − y ^    ∣ e = \mid y-\hat{y} \,\,\mid e=∣yy^

  • L is mean square error(MSE)

e = ( y − y ^ ) 2 e= (y-\hat{y})^2 e=(yy^)2

如果真实值和预测值都是概率分布,则会使用交叉熵作为损失函数用以度量两者之间的差异

3. Optimization

w ∗ , b ∗ = a r g min ⁡ w , b L w^*,b^*=arg\min_{w,b}L w,b=argw,bminL

  • Gradient Descent

随机选取初始点

计算L损失函数在初始点对w的导

如果计算在这一点的斜率小于零(负数)则增大w

如果计算在这一点的斜率大于零(正数)则减小w

  • Learning Rate

η ∂ L ∂ w ∣ w = w 0 \eta\frac{\partial L}{\partial w}\mid_{w = w^0} ηwLw=w0

η: learning rate


  • Hyperparameters

4. Model Bias

  • Linear models have severe limitation. So we need a more flexible model !

y = b + w x ⇒ y = b + ∑ i c i s i g m o i d ( b i + w i x ) y=b+wx \Rightarrow y=b+\sum_{i} c_i sigmoid(b_i+w_ix) y=b+wxy=b+icisigmoid(bi+wix)

y = b + ∑ j w j x j ⇒ y = b + ∑ i c i s i g m o i d ( b i + ∑ j w i j x j ) y=b+\sum_j w_jx_j\Rightarrow y=b+\sum_i c_i sigmoid(b_i+\sum_jw_{ij}x_j) y=b+jwjxjy=b+icisigmoid(bi+jwijxj)

5. Backpropagation

  • Backpropagation: an efficient way to compute ∂L/∂w in neural network.

    • Gradient Descent

    • Chain Rule

      Case 1 :
      y = g ( x )         z = h ( y ) Δ x → Δ y → Δ z        d z d x = d z d y d y d x y = g(x)\ \ \ \ \ \ \ z=h(y) \\ \Delta x \rightarrow \Delta y \rightarrow \Delta z \ \ \ \ \ \ \frac{dz}{dx} = \frac{dz}{dy}\frac{dy}{dx} y=g(x)       z=h(y)ΔxΔyΔz      dxdz=dydzdxdy
      Case 2 :
      x = g ( s )       y = h ( s )       z = k ( x , y ) d z d s = ∂ z ∂ x d x d s + ∂ z ∂ y d y d s x = g(s) \ \ \ \ \ y = h(s) \ \ \ \ \ z = k(x,y) \\ \frac{dz}{ds} = \frac{\partial z}{\partial x}\frac{dx}{ds}+\frac{\partial z}{\partial y}\frac{dy}{ds} x=g(s)     y=h(s)     z=k(x,y)dsdz=xzdsdx+yzdsdy

    • L ( θ ) = ∑ n = 1 N C n ( θ ) → ∂ L ( θ ) ∂ w = ∑ n = 1 N ∂ C n ( θ ) ∂ w L(\theta)=\sum_{n=1}^NC^n(\theta) \rightarrow \frac{\partial L(\theta)}{\partial w}=\sum_{n=1}^N\frac{\partial C^n(\theta)}{\partial w} L(θ)=n=1NCn(θ)wL(θ)=n=1NwCn(θ)

6.

7. Ups and downs of Deep learning

  • 1958: Perceptron (linear model 感知机,一种人工神经网络)

  • 1969: Perceptron has limitation

  • 1980s: Multi-layer perceptron

    • Do not have significant difference from DNN today.
  • 1986: Backpropagation

    • Usually more than 3 hidden layers is not helpful.
  • 1989: 1 hidden layer is “good enough”, why deep?

  • 2006: RBM initialization(breakthrough)

  • 2009: GPU

  • 2011: Start to be popular in speech recognition

  • 2012: win ILSVRC image competition

Gradient Descent for Deep Learning

  • 深度学习的梯度,即对损失函数中出现的变量值逐一求偏导并求和,而相关变量则由计算的偏导值减去偏导值与学习率的乘积

下班

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值