Deep Learning-- class 2

最新推荐文章于 2024-09-09 16:28:02 发布

小宁cyn

最新推荐文章于 2024-09-09 16:28:02 发布

阅读量102

点赞数

分类专栏：深度学习--吴恩达笔记文章标签：深度学习人工智能大数据

本文链接：https://blog.csdn.net/weixin_51693436/article/details/127688958

版权

深度学习--吴恩达笔记专栏收录该内容

2 篇文章 0 订阅

订阅专栏

basic
- deep L-layer neural network
  - forward propagation (vectorized)
    - Z[l] = W[l] A[l-1] + B[l]
    - A[l] = g[l](Z[l])
  - get the matrix dimensions right
    - - Z[1] = W[1] X + B[1]
      - （n1, 1）,（n1, n0），（n0,1）,（n1,1）
      - single and mutiply
        z(l), a(l) -->(nl,1);
        
        Z(l),A(l) -->(nl, m);
  - why deep representations?
    - It is best that more hidden layers are more effective than neural units
  - build blocks
    - forward and backward functions
  - forward propagation for layer l
    - input a[l-1] ; output a[l]; cache(z[l])
  - backward propagation for layer l
    - input da[l] ; output da[l-1], dW[l], dB[l]
  - summary
  - hyperparameters and parameters
    - hyperparameters
      - learning rate alpha
      - hidden layer L
      - iteration n
    - parameters
      - W1 , B1, W2,B2....
- machine learning
  - training/ developing(hold-out cross validation)/ test sets
    - before rate : 60:20:20
    - now rate : just make sure dev sets have enough examples(not 20%, maybe just 1%(10k) or 0.1%)
  - bias/ variance（偏差/方差）
  - basic recipe
    - high bias (training data performance): bigger network or train longer
    - high variance--过拟合 (developing data performance): more data or regularization
  - regularization-- to avoid overfitting
    - L2 regularization: to make W more appropriate:
      - just W not B: because the number of W is more than that of B
    - in neural network
    - why prevent overfitting?
    - dropout regularization
      - inverted dropout(反向随即失活)
        keep_prob: 0.8;0.9;1
      - understanding
        can't rely on any features, so have to spread out weights
        
        if a layer is easy to be overfitting, set the keep_prob less, maybe 0.5
  - other ways to avoid overfitting
    - flip the image horizontally(水平翻转图像)
    - take random distortions and translations of the image(随即扭转、变换图片)
    - early stopping
  - normalizing training sets
    - move them
    - divide them by this vector sigma squared
    - average==0，variance==1
  - exploding and vanishing gradients
    - W>1 exploding
    - W<1 vanishing
  - weight initialization
  - gradient checking