•Adaptive Linear Element (ADLINE) VS Perceptron
–When the problem is not linearly separable, perceptron will fail to converge
–ADLINE can overcome this difficulty by finding a best fit approximation to the target.
•
We have training pairs (X(k), d(k), k =1, 2, …, K), where K is the number of training samples, the training error specifies the difference between the output of the ALDLINE and the desired target
•
The smaller E(W) is, the closer is the approximation
•
We need to find W, based on the given training set, that minimizes the error E(W)
The Gradient Descent Rule
–The gradient of E is a vector, whose components are the partial derivatives of E with respect to each of the wi
(E的梯度的每一个维度的值是E相对于相应w的某一维度的偏导数)
–The gradient specifies the direction that produces the speepest increase in E.
– Negative of the vector gives the direction of steepest decrease.
(梯度的方向指明了E的最快上升方向,反向则是最快下降方向)
•
The gradient training rule is
h is the training rate
•
ADLINE weight updating using gradient descent rule
•Gradient descent training procedure
–Initialise wi to small vales, e.g., in the range of (-1, 1), choose a learning rate, e.g., h = 0.2
–Until the termination condition is met, Do
Stochastic (Incremental) Gradient Descent
•Also called online mode, Least Mean Square (LMS), Widrow-Hoff, and Delta Rule
–Initialise wi to small vales, e.g., in the range of (-1, 1), choose a learning rate, e.g., h = 0.01 (should be smaller than batch mode)
–Until the termination condition is met, Do
(batch mode 与 online mode :batch mode 将所有的训练集训练一遍(计算一遍)之后再更新W;online mode 对于每一个训练样本都更新一次W。)
•Training is an iterative process; training samples will have to be used repeatedly for training
•Assuming we have K training samples [(X(k), d(k)), k=1, 2, …, K]; then an epoch is the presentation of all K sample for training once
•
–First epoch: Present training samples: (X(1), d(1)), (X(2), d(2)), … (X(K), d(K))
–Second epoch: Present training samples: (X(K), d(K)), (X(K-1), d(K-1)), … (X(1), d(1))
–Note the order of the training sample presentation between epochs can (and should normally) be different.
•
•Normally, training will take many epochs to complete
Termination of Training
•
To terminate training, there are normally two ways
–
When a pre-set number of training epochs is reached
–
When the error is smaller than a pre-set value