深度学习笔记（吴恩达）

最新推荐文章于 2023-06-20 16:39:02 发布

DatCat

最新推荐文章于 2023-06-20 16:39:02 发布

阅读量147

点赞数

分类专栏： AI机器学习与NLP等

本文链接：https://blog.csdn.net/qq_33095515/article/details/100549612

版权

AI机器学习与NLP等专栏收录该内容

3 篇文章 0 订阅

订阅专栏

每章框架

COURSE1

WEEK TWO:

Preprocessing the dataset is important.
You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm.

WEEK THREE

Define the neural network structure ( # of input units, # of hidden units, etc).
Initialize the model’s parameters
Loop:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)

WEEK FOUR
As usual you will follow the Deep Learning methodology to build the model:

Initialize parameters / Define hyperparameters
Loop for num_iterations:
a. Forward propagation
b. Compute cost function
c. Backward propagation
d. Update parameters (using parameters, and grads from backprop)
Use trained parameters to predict labels

COURSE2

WEEK FIVE
1.Initialization

3-layer NN with zeros initialization	fails to break symmetry
3-layer NN with large random initialization	too large weights
3-layer NN with He initialization	recommended method

2.Regularization
(1)L2-regularization
The value of ? is a hyperparameter that you can tune using a dev set.
L2 regularization makes your decision boundary smoother. If ? is too large, it is also possible to “oversmooth”, resulting in a model with high bias.
What is L2-regularization actually doing?:
L2-regularization relies on the assumption that a model with small weights is simpler than a model with large weights. Thus, by penalizing the square values of the weights in the cost function you drive all the weights to smaller values. It becomes too costly for the cost to have large weights! This leads to a smoother model in which the output changes more slowly as the input changes.
What you should remember

he implications of L2-regularization on:
The cost computation:
A regularization term is added to the cost
The backpropagation function:
There are extra terms in the gradients with respect to weight matrices - Weights end up smaller (“weight decay”): - Weights are pushed to smaller values.
(2)Dropout
A common mistake when using dropout is to use it both in training and testing. You should use dropout (randomly eliminate nodes) only in training.
Deep learning frameworks like tensorflow, PaddlePaddle, keras or caffe come with a dropout layer implementation. Don’t stress - you will soon learn some of these frameworks.
What you should remember about dropout:
Dropout is a regularization technique.
You only use dropout during training. Don’t use dropout (randomly eliminate nodes) during test time.
Apply dropout both during forward and backward propagation.
During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.

What we want you to remember from this notebook: - Regularization will help you reduce overfitting. - Regularization will drive your weights to lower values. - L2 regularization and Dropout are two very effective regularization techniques.

3.Gradient Checking
What you should remember from this notebook:

Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation).
Gradient checking is slow, so we don’t run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process.

一些关键词与重点

softmax 函数

交叉熵(cross entropy)

DatCat

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
深度学习笔记（吴恩达）

每章框架COURSE1WEEK TWOPreprocessing the dataset is important.You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().Tuning the learning rate (whic...
复制链接

扫一扫