Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)

最新推荐文章于 2024-07-14 14:26:38 发布

hello_JeremyWang

最新推荐文章于 2024-07-14 14:26:38 发布

阅读量137

点赞数 1

分类专栏：深度学习理论知识文章标签：深度学习神经网络人工智能

本文链接：https://blog.csdn.net/hello_JeremyWang/article/details/120896740

版权

深度学习理论知识专栏收录该内容

9 篇文章 0 订阅

订阅专栏

1. Tips for DNN

In this lesson, Pro. LEE taught us some tips for deep neural network, which contains：

Adaptive Learning Rate
New Activation Function
Dropout
Regularization
Early Stopping

在这里插入图片描述

1.1 Adaptive Learning rate

The knowledge about Adaptive Learning Rate has already been introduced in my previous blog. Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2).

1.2 New Activate Function

The reason that we need to find a new activate function, rather than previous Sigmoid Function, can be explained in the following figure. Because Sigmoid Function maps a large value to a small value，the influence of the input layer will become less and less during the process of propogation. Judging from the perspective of back-propogation theory, the gradient of the input layer will become so little that we can not train the network.
在这里插入图片描述
With the aim of solving this problem, the new activate function introduced in this lesson is ReLU. Looking at the first quadrant，the activate function does not change the value of the input, which will vanish the gradient problem. And the second quadrant of the function resemble the working process of our brain. Most neurons in the brain are not excited. They are excited only when the stimulation exceeds a certain threshold.

在这里插入图片描述
The stage of different neurals can be shown as following figure. For the activate neurals, they does not change the value of input. For the unactivate neurals, we can regard them as unexisted ones.

So, the structure of the neural network can be shown as:
在这里插入图片描述
ReLU has some other versions. Some people think gradient should not equals to zero but to a very small number when the ouput is less than zero, so the left one is proposed. Some people think the gradient should be a changeable parameter, so the right one is developed.

ReLu actually offers us a linear activate function, but just one linear structure. Can we find a activate containing different linear structure for different input value? The answer is yes. The maxout can do this.
The following figure tells us maxout can do the same thing with ReLU.
在这里插入图片描述
The following figures show us the way in which maxout offers us different linear structure in one activate function.

1.3 Early Stopping

Early stopping is an effective way to deal with the problem of overfitting. We need to make the training of our neural network stop before finding the minima of the loss function of training set.
在这里插入图片描述

1.4 Regularization

Regularization is adding a for our parameters in order to avoid overfitting.

1.5 Dropout

Dropout is also a method to avoid overfitting. Dropout means we need to drop some parts of our neurals every time we train our network.
在这里插入图片描述
When we use our network in testing set, we should not drop any parts of our network and should times (1-p)% for all weights if our droping rate is p% at training set.

Dropout can be seen as a method of ensemble, just like Randomforest or XGBoost. The reason why we can regard dropout as a kind of ensemble method can be explained by the following figures.
在这里插入图片描述

hello_JeremyWang

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)

1. Tips for DNNIn this lesson, Pro. LEE taught us some tips for deep neural network, which contains：Adaptive Learning RateNew Activation FunctionDropoutRegularizationEarly Stopping1.1 Adaptive Learning rateThe knowledge about Adaptive Learning R
复制链接

扫一扫