Deep Neural Networks - Parameters vs Hyperparameters

最新推荐文章于 2022-08-15 11:48:14 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2022-08-15 11:48:14 发布

阅读量112

点赞数

分类专栏：人工智能 # Neural Networks and Deep Learning

本文链接：https://blog.csdn.net/edward_wang1/article/details/119085026

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

Neural Networks and Deep Learning

32 篇文章 0 订阅

订阅专栏

The notes when study the Coursera class by Mr. Andrew Ng "Neural Networks & Deep Learning", section 4.7 "Parameters vs Hyperparameters". It shows what are the usual Hyper parameters in deep NN and how to select hyper parameters in general. Share it with you and hope it helps!
————————————————

Being effective in developing your deep NN requires that you not only organize your parameters well, but also your hyper parameters. So, what are hyper parameters?

Parameters

$W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}, W^{[3]}, b^{[3]}...$

Hyper parameters

Learning rate $\alpha$ : it determines how the parameters evolve
Number of iterations of gradient descent
Number of hidden layers
Number of hidden units $n^{[1]}, n^{[2]}, ...$
Choice of activation functions: Sigmoid, tanh, ReLU, Leak ReLU...
We'll see more later like momentum term, mini batch size, regularization parameters, ...

Hyper parameters are parameters that you need to tell your algorithm and they control the ultimate parameters $W^{[n]}$ and $b^{[n]}$ .

When you're training a deep NN for your own application, you find that there maybe a lot of possible settings for the hyper parameters that you need just try out. Applying deep learning today is a very empirical process. You often have an idea, for example a value for learning rate $\alpha=0.01$ , then you implement it (coding) and tried it out (experiment) and see how that works. Then based on that output, you might want to change that online and increase the learning rate to 0.05. This process is shown as figure-1.

If you're not sure what's the best value for the learning rate to use, you might try different values of the learning rate $\alpha$ , then check and compare the curves of how the cost function J changes with the number of iterations. The line of the bottom in figure-2 should show the learning rate $\alpha$ you want to select.

When you're starting on the new application, it's very difficult to know in advance what's the best values for the hyper parameters. You need try out many different values and go around the cycle in figure-1. Empirical process is just a fancy way of saying you just have to try a lot of things and see what works.

Deep learning today is applied to so many problems: computer vision, speech recognition, natural language processing, online advertising, web search, product recommendations and so on.

For researchers from any one discipline try to go to a different one, sometimes the hyper parameters could carry over, sometimes it doesn't. So I often advise people especially when studying on a new problem to try out a range of values and see what works. We'll see a systemematic way for this later.
Even if you're working on an application for a long time, for example online advertising, it's quite possible that the best value for the learning rate, the number of hidden units and so on might change maybe a year from now. Maybe because the computer infrastructure like CPUs or the GPUs or something has changed. So the rule of thumb is every now and then, maybe every a few months, just try a few values for the hyper parameters and double check if there's a better value for the hyper parameters. As you do so, you slowly gain intuitions as well about the hyper parameters that work best for your problem. Maybe this is one area where deep learning researchers is still advancing. Maybe over time we'll be able to give better guidance for the best hyper parameters to use.

<end>