Chapter 4 Learning of Nonlinear Patterns by Neural Networks
the complete article:http://note.youdao.com/noteshare?id=12d07d687a422d41b17375c614eda362
4.1 Introduction and Overview
first-order error minimization methods:backpropagation,delta-bar-delta(adaptive learning rate),steepest descent
second-order error minimization methods:QuickProp,Gaus-Newton,LM learning
4.4 Backpropagation Learning
4.4.1.1 Erroe Gradient with Respect to Output Neuron Weights
4.4.1.2 The Error Gradient with Respect to the Hidden-Neuron Weights
4.4.1.4 Batch Learning
example-by-example learning(online learning,OGD):weights change after every new pattern presented.This method may yield effective results and it may be the most suitable method for online learning,in which learning or updating required as and when data arrives in real time.So,it not exists the concepts of epoch.
batch learning(offline learning):In the complete training,all patterns are used.It contains batch gradient descent,stochastic gradient desecent and so on.
batch gradient descent(BGD):A learning algorithm is BGD or not depends on the predicted lables is stable or re-culculated in a epoch.
stochastic gradient descent(SGD):It's like online learning,but the environment is very different.And we can use several not one pattern to calculate the gradient to change the weights in a epoch.
4.4.1.5 Learning Rate and Weight Update
Even if the mse is stable,the mse may be not the optimal but the suboptimal.
4.4.1.7 Momentum
Why need Momentum?
We need small weight change around the optimum points.So,for the ravine,we can use momentum to reduce the weight change to subdue the oscillations.But for the bowl,because of it's flat,the weight change without momentum is appropriate for learning process.If we use momentum in it,whereas reducing to oscillations.
4.4.6 Example:Backpropagation Learning Case Study--Solving a Complex Clasification Problem
How to choose appropriate parameters?
4.5 Delata-Bar-Delta Learning(Adaptive Learning Rate) Method
fm is the recent direction,and the sign of dmfm determines whether the learning rate increases or not!
4.6 Steepest Descent Method
In each epoch,the epsilon(the learning rate) returns to the initial value first,e.g.
So,it can set a large value as initial learning rate,and resulting to the faster descent at the beginning.
4.7 Second-Order Methods of Error Minnimization and Weight Optimization
4.7.1 QuickProp
4.7.1.3 Comparison of QuickProp with Steepest Descent,Delta-Bar-Delta,and Backpropagation
QuickProp don't have the learning rate parameters.
4.7.2 General Concept of Second-Order Methods of Error Minimization
Why?First-order Taylor expansion with derived function.
If R=1,they're the first-order gradient methods.
4.7.3 Gauss-Newton Method
基本区别是:Gauss-Newton只用于求解非线性最小二乘问题,Newton法可用于求解任意连续函数的最优化问题。高斯牛顿法的一个小技巧是,将二次偏导省略,不需要计算二阶导数。
4.7.3.2 Example:Network Training with Gauss-Newton Method-A Computer Experiment
4.7.4 The Levenberg-Marquardt Method
4.7.5 Comparison of the Efficiency of the First-Order and Second-Order Methods in Minimizing Error
First-order methods are slow and need find appropriate parameters by trial and error.Second-order methods are fast but have a substantial computational cost and the risk of the numerical instability in the handling of the second derivative.