neural network and deep learning笔记(2)

上次读到这本书的第二章,第三章的内容较多,也做了一些扩展,所以单独出来。
这里写图片描述

#

“In fact, with the change in cost function it’s not possible to say precisely what it means to use the “same” learning rate.”

Cross -entropy function is a way to solve the neuron saturation problem,is there other way?

Sigmoid+cross_entropy VS softmax+log-likehood

这里写图片描述

#

Indeed, researchers continue to write papers where they try different approaches to regularization, compare them to see which works better, and attempt to understand why different approaches work better or worse. And so you can view regularization as something of a kludge. While it often helps, we don’t have an entirely satisfactory systematic understanding of what’s going on, merely incomplete heuristics and rules of thumb.

#

It’s like trying to fit an 80,000th degree polynomial to 50,000 data points. By all rights, our network should overfit terribly. And yet, as we saw earlier, such a network actually does a pretty good job generalizing. Why is that the case? It’s not well understood. It has been conjectured that “the dynamics of gradient descent learning in multilayer nets has a `self-regularization’ effect“. This is exceptionally fortunate, but it’s also somewhat disquieting that we don’t understand why it’s the case.

这里写图片描述

#
there’s a pressing need to develop powerful regularization techniques to reduce overfitting, and this is an extremely active area of current work.
这里写图片描述

5.How to choose a neural network’s hyper-parameters?
① strip the problem down:such as simplify the problem so it can gives you rapid insight into how to build the network.
② stripping your network down to the simplest network likely to do meaningful learning.
increasing the frequency of monitoring of the network so that you can get quick feedback.

这里写图片描述

#
carefully monitoring your network’s behaviour
#
Your goal should be to develop a workflow that enables you to quickly do a pretty good job on the optimization, while leaving you the flexibility to try more detailed optimizations, if that’s important.
#
While it would be nice if machine learning were always easy, there is no a priori reason it should be trivially simple.

这里写图片描述

Some remain challenges:
1)A proper learning rate is difficult to choose, and the learning rate schedules are pre-defined which unable to adaptation to the dataset’s characteristics.
2)In practical , our data is sparse and the features may have very different frequencies ,but we apply the same learning rate to all parameter updates.perhaps update parameter in different extent is a more suitable way.
3)minimizing highly non-convex error functions ‘s difficulty in fact not from local minima but from saddle poitns ,i.e.points where one dimension slopes up and another slopes down,because these saddle points are usually surrounded by a plateau of the same error.,which makes it notoriously hard for SGD to escape.

这里写图片描述

这里写图片描述

这里写图片描述

Trick:
Since some of the weights may need to increase while others need to decrease. That can only happen if some of the input activations have different signs. So there are some empirical evidence to suggest that the tanh sometimes performs better than sigmoid.

REFERENCE:
[1]Practical Recommendations for Gradient-Based Training of Deep Architectures.Yoshua Bengio
[2]http://sebastianruder.com/optimizing-gradient-descent/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值