Notes for Deep Learning Lessons of Pro. Hung-yi Lee (4)

1. Tips for DNN

In this lesson, Pro. LEE taught us some tips for deep neural network, which contains:

  1. Adaptive Learning Rate
  2. New Activation Function
  3. Dropout
  4. Regularization
  5. Early Stopping

在这里插入图片描述

1.1 Adaptive Learning rate

The knowledge about Adaptive Learning Rate has already been introduced in my previous blog. Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2).

1.2 New Activate Function

The reason that we need to find a new activate function, rather than previous Sigmoid Function, can be explained in the following figure. Because Sigmoid Function maps a large value to a small value,the influence of the input layer will become less and less during the process of propogation. Judging from the perspective of back-propogation theory, the gradient of the input layer will become so little that we can not train the network.
在这里插入图片描述
With the aim of solving this problem, the new activate function introduced in this lesson is ReLU. Looking at the first quadrant,the activate function does not change the value of the input, which will vanish the gradient problem. And the second quadrant of the function resemble the working process of our brain. Most neurons in the brain are not excited. They are excited only when the stimulation exceeds a certain threshold.

在这里插入图片描述
The stage of different neurals can be shown as following figure. For the activate neurals, they does not change the value of input. For the unactivate neurals, we can regard them as unexisted ones.
在这里插入图片描述

So, the structure of the neural network can be shown as:
在这里插入图片描述
ReLU has some other versions. Some people think gradient should not equals to zero but to a very small number when the ouput is less than zero, so the left one is proposed. Some people think the gradient should be a changeable parameter, so the right one is developed.
在这里插入图片描述
ReLu actually offers us a linear activate function, but just one linear structure. Can we find a activate containing different linear structure for different input value? The answer is yes. The maxout can do this.
The following figure tells us maxout can do the same thing with ReLU.
在这里插入图片描述
The following figures show us the way in which maxout offers us different linear structure in one activate function.
在这里插入图片描述
在这里插入图片描述

1.3 Early Stopping

Early stopping is an effective way to deal with the problem of overfitting. We need to make the training of our neural network stop before finding the minima of the loss function of training set.
在这里插入图片描述

1.4 Regularization

Regularization is adding a for our parameters in order to avoid overfitting.

1.5 Dropout

Dropout is also a method to avoid overfitting. Dropout means we need to drop some parts of our neurals every time we train our network.
在这里插入图片描述
When we use our network in testing set, we should not drop any parts of our network and should times (1-p)% for all weights if our droping rate is p% at training set.
在这里插入图片描述
Dropout can be seen as a method of ensemble, just like Randomforest or XGBoost. The reason why we can regard dropout as a kind of ensemble method can be explained by the following figures.
在这里插入图片描述
在这里插入图片描述

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值