【吴恩达深度学习】不全笔记哈哈【更新中...】

CODE_RabbitV

已于 2024-07-14 18:37:59 修改

阅读量393

点赞数

分类专栏：深度学习文章标签：深度学习机器学习人工智能

于 2022-08-25 22:33:11 首次发布

本文链接：https://blog.csdn.net/CODE_RabbitV/article/details/126533007

版权

深度学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Parameters & hyperparameters:

hyperparameters: learning rate $\alpha$ , # iterations, # hiddlen layers, # hidden units, choice of activation function, # minibatch …

hyperparameters determine parameters to some extent.

bias and variance

High variance: 训练上好，测试上差
High bias: 都一样差 $\to$ regularization, data augmentation, early stopping (not good)
High variance, High bias: 训练上差，测试上更差

Improving Deep Neural Networks: Hyper-parameter tuning, Regularization and Optimization.
Structuring your Machine Learning project.
Convolutional Neural Networks.
Neural Language Processing: Building sequence models.

2.11 向量化

L1 regularization: J += $\frac{\lambda}{2m} ||w||_1$
L2 regularization (weight decay by rate [ $1-\frac{\alpha \lambda}{m}$ ]): J += $\frac{\lambda}{2m} ||w||^2_2$

$\lambda$ : regularization parameter

$\lambda$ 越大 $W$ 越小 $Z$ 越小 (覆盖区域会变窄) 使得非线性激活函数基本起线性运算作用，使得网络的表达能力降低，从而无法 overfitting

Dropout regularization:
在这里插入图片描述
Intuition for drop-out: cannot rely on any one feature, so have to spread out weights.

exploding & vanishing:

weight initialization:

Gradient check:

mini-batch gradient: between batch gradient and stochastic gradient.

epoch: a single pass through the training set.

Momentum: exponentially weighted averages of the gradient, with $\beta=$ 0.9
RMSprop: keep the exponentially weighted averages, but $w-\alpha \frac{dw}{\sqrt{S_{dw}}}$ : decrease the update in bigger gradient; increase the update in smaller gradient. 缓解震荡

Adam: combine Momentum and RMSprop.

Learning rate decay

the problem of local optimal …

遇到的局部最优更可能是鞍点，因为，对于一个 n 维度的空间而言，所有的维度都是 concave 或者 convex 的可能性很小

import numpy as npimport tensorflow as tf
w = tf.variable(e,dtype=tf.float32)
cost = tf.add(tf.add(w**2,tf.multiply(-10.,w)),25)
train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
init = tf.global_variables_initializer()

session = tf.Session()session.run(init)
print(sessionrun(w)) # 0.0

session.run(train)
print(session.run(w)) # 0.1

for i in range(1000):
	session.run(trainprint(session.run(w))
print(session.run(w)) # 4.99999