Neural Networks for Applied Sciences and Engineering--Chapter 4

最新推荐文章于 2022-09-20 14:03:25 发布

muzhen_xupeng

最新推荐文章于 2022-09-20 14:03:25 发布

阅读量350

点赞数

分类专栏： NN

本文链接：https://blog.csdn.net/muzhen_xupeng/article/details/54767371

版权

NN 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

 
 Chapter 4 Learning of Nonlinear Patterns by Neural Networks 

 
 the complete article:http://note.youdao.com/noteshare?id=12d07d687a422d41b17375c614eda362 

 
 4.1 Introduction and Overview 

  first-order error minimization methods:backpropagation,delta-bar-delta(adaptive learning rate),steepest descent 

  second-order error minimization methods:QuickProp,Gaus-Newton,LM learning 

 
 4.4 Backpropagation Learning 

 
 4.4.1.1 Erroe Gradient with Respect to Output Neuron Weights 

 
 4.4.1.2 The Error Gradient with Respect to the Hidden-Neuron Weights 

 
 4.4.1.4 Batch Learning 

 
 example-by-example learning(online learning,OGD):weights change after every new pattern presented.This method may yield effective results and it may be the most suitable method for online learning,in which learning or updating required as and when data arrives in real time.So,it not exists the concepts of epoch. 

 
 batch learning(offline learning):In the complete training,all patterns are used.It contains batch gradient descent,stochastic gradient desecent and so on. 

 
 batch gradient descent(BGD):A learning algorithm is BGD or not depends on the predicted lables is stable or re-culculated in a epoch. 

 
 stochastic gradient descent(SGD):It's like online learning,but the environment is very different.And we can use several not one pattern to calculate the gradient to change the weights in a epoch. 

 
 4.4.1.5 Learning Rate and Weight Update 

 
 Even if the mse is stable,the mse may be not the optimal but the suboptimal. 

 
 4.4.1.7 Momentum 

 
 Why need Momentum? 

 
 We need small weight change around the optimum points.So,for the ravine,we can use momentum to reduce the weight change to subdue the oscillations.But for the bowl,because of it's flat,the weight change without momentum is appropriate for learning process.If we use momentum in it,whereas reducing to oscillations. 

 
 4.4.6 Example:Backpropagation Learning Case Study--Solving a Complex Clasification Problem 

 
 How to choose appropriate parameters? 

 
 4.5 Delata-Bar-Delta Learning(Adaptive Learning Rate) Method 

 
 fm is the recent direction,and the sign of dmfm determines whether the learning rate increases or not! 

 
 4.6 Steepest Descent Method 

 
 In each epoch,the epsilon(the learning rate) returns to the initial value first,e.g. 

 
 So,it can set a large value as initial learning rate,and resulting to the faster descent at the beginning. 

 
 4.7 Second-Order Methods of Error Minnimization and Weight Optimization 

 
 4.7.1 QuickProp 

 
 4.7.1.3 Comparison of QuickProp with Steepest Descent,Delta-Bar-Delta,and Backpropagation 

 
 QuickProp don't have the learning rate parameters. 

 
 4.7.2 General Concept of Second-Order Methods of Error Minimization 

 
 Why?First-order Taylor expansion with derived function. 

 
 If R=1,they're the first-order gradient methods. 

 
 4.7.3 Gauss-Newton Method 

 
 基本区别是：Gauss-Newton只用于求解非线性最小二乘问题，Newton法可用于求解任意连续函数的最优化问题。高斯牛顿法的一个小技巧是，将二次偏导省略,不需要计算二阶导数。 

 
 4.7.3.2 Example：Network Training with Gauss-Newton Method-A Computer Experiment 

 
 4.7.4 The Levenberg-Marquardt Method 

 
 4.7.5 Comparison of the Efficiency of the First-Order and Second-Order Methods in Minimizing Error 

 
 First-order methods are slow and need find appropriate parameters by trial and error.Second-order methods are fast but have a substantial computational cost and the risk of the numerical instability in the handling of the second derivative.