Deep Neural Networks - Parameters vs Hyperparameters

The notes when study the Coursera class by Mr. Andrew Ng "Neural Networks & Deep Learning", section 4.7 "Parameters vs Hyperparameters". It shows what are the usual Hyper parameters in deep NN and how to select hyper parameters in general. Share it with you and hope it helps!
————————————————

Being effective in developing your deep NN requires that you not only organize your parameters well, but also your hyper parameters. So, what are hyper parameters?

Parameters

W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}, W^{[3]}, b^{[3]}...

Hyper parameters

  • Learning rate \alpha: it determines how the parameters evolve
  • Number of iterations of gradient descent
  • Number of hidden layers
  • Number of hidden units n^{[1]}, n^{[2]}, ...
  • Choice of activation functions: Sigmoid, tanh, ReLU, Leak ReLU...
  • We'll see more later like momentum term, mini batch size, regularization parameters, ...

Hyper parameters are parameters that you need to tell your algorithm and they control the ultimate parameters W^{[n]} and b^{[n]}.

Figure-1

When you're training a deep NN for your own application, you find that there maybe a lot of possible settings for the hyper parameters that you need just try out. Applying deep learning today is a very empirical process. You often have an idea, for example a value for learning rate \alpha=0.01, then you implement it (coding) and tried it out (experiment) and see how that works. Then based on that output, you might want to change that online and increase the learning rate to 0.05. This process is shown as figure-1.

Figure-2

If you're not sure what's the best value for the learning rate to use, you might try different values of the learning rate \alpha, then check and compare the curves of how the cost function J changes with the number of iterations. The line of the bottom in figure-2 should show the learning rate \alpha you want to select.

When you're starting on the new application, it's very difficult to know in advance what's the best values for the hyper parameters. You need try out many different values and go around the cycle in figure-1. Empirical process is just a fancy way of saying you just have to try a lot of things and see what works.

Deep learning today is applied to so many problems: computer vision, speech recognition, natural language processing, online advertising, web search, product recommendations and so on.

  • For researchers from any one discipline try to go to a different one, sometimes the hyper parameters could carry over, sometimes it doesn't. So I often advise people especially when studying on a new problem to try out a range of values and see what works. We'll see a systemematic way for this later.
  • Even if you're working on an application for a long time, for example online advertising, it's quite possible that the best value for the learning rate, the number of hidden units and so on might change maybe a year from now. Maybe because the computer infrastructure like CPUs or the GPUs or something has changed. So the rule of thumb is every now and then, maybe every a few months, just try a few values for the hyper parameters and double check if there's a better value for the hyper parameters. As you do so, you slowly gain intuitions as well about the hyper parameters that work best for your problem. Maybe this is one area where deep learning researchers is still advancing. Maybe over time we'll be able to give better guidance for the best hyper parameters to use.

<end>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值