【吴恩达深度学习】不全笔记哈哈【更新中...】

Parameters & hyperparameters:

hyperparameters: learning rate α \alpha α, # iterations, # hiddlen layers, # hidden units, choice of activation function, # minibatch

hyperparameters determine parameters to some extent.

bias and variance

  • High variance: 训练上好,测试上差
  • High bias: 都一样差 → \to regularization, data augmentation, early stopping (not good)
  • High variance, High bias: 训练上差,测试上更差


  1. Improving Deep Neural Networks: Hyper-parameter tuning, Regularization and Optimization.
  2. Structuring your Machine Learning project.
  3. Convolutional Neural Networks.
  4. Neural Language Processing: Building sequence models.

2.11 向量化

L1 regularization: J += λ 2 m ∣ ∣ w ∣ ∣ 1 \frac{\lambda}{2m} ||w||_1 2mλ∣∣w1
L2 regularization (weight decay by rate [ 1 − α λ m 1-\frac{\alpha \lambda}{m} 1mαλ]): J += λ 2 m ∣ ∣ w ∣ ∣ 2 2 \frac{\lambda}{2m} ||w||^2_2 2mλ∣∣w22

λ \lambda λ: regularization parameter

λ \lambda λ 越大 W W W 越小 Z Z Z 越小 (覆盖区域会变窄) 使得非线性激活函数基本起线性运算作用,使得网络的表达能力降低,从而无法 overfitting

Dropout regularization:
在这里插入图片描述
Intuition for drop-out: cannot rely on any one feature, so have to spread out weights.


exploding & vanishing:

weight initialization:

Gradient check:

mini-batch gradient: between batch gradient and stochastic gradient.

epoch: a single pass through the training set.

Momentum: exponentially weighted averages of the gradient, with β = \beta= β= 0.9
RMSprop: keep the exponentially weighted averages, but w : = w − α d w S d w w:= w-\alpha \frac{dw}{\sqrt{S_{dw}}} w:=wαSdw dw: decrease the update in bigger gradient; increase the update in smaller gradient. 缓解震荡

Adam: combine Momentum and RMSprop.

Learning rate decay

the problem of local optimal …

遇到的局部最优更可能是鞍点,因为,对于一个 n 维度的空间而言,所有的维度都是 concave 或者 convex 的可能性很小

import numpy as npimport tensorflow as tf
w = tf.variable(e,dtype=tf.float32)
cost = tf.add(tf.add(w**2,tf.multiply(-10.,w)),25)
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
init = tf.global_variables_initializer()

session = tf.Session()session.run(init)
print(sessionrun(w)) # 0.0

session.run(train)
print(session.run(w)) # 0.1

for i in range(1000):
	session.run(trainprint(session.run(w))
print(session.run(w)) # 4.99999

有一个单一的评价指标,才能快速筛选好的模型

ResNet 设计的残差结构,可以使得轻松忽略无用(参数为几乎为 0)的卷积层

max pooling 几乎都在所有任务中都比 mean pooling 表现好 (把特征平均化意义不大)


参考文献:

https://www.coursera.org/deeplearning-ai

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值