Neural Network
Three typical schemes
SGD-Gradient Descent
- Function DeltaSGD & Sigmoid
function W = DeltaSGD(W, X, D)
alpha = 0.9;
N = 4;
for k = 1:N
x = X(k, :)';
d = D(k);
v = W*x;
y = Sigmoid(v);
e = d - y;
delta = y*(1-y)*e;
dW = alpha*delta*x; % delta rule
W(1) = W(1) + dW(1);
W(2) = W(2) + dW(2);
W(3) = W(3) + dW(3);
end
function y = Sigmoid(x)
y = 1 / (1 + exp(-x));
end
BGD-Batch Gradient Descent
function W = DeltaSGD(W, X, D)
alpha = 0.9;
N = 4;
for k = 1:N
x = X(k, :)';
d = D(k);
v = W*x;
y = Sigmoid(v);
e = d - y;
delta = y*(1-y)*e;
dW = alpha*delta*x; % delta rule
W(1) = W(1) + dW(1);
W(2) = W(2) + dW(2);
W(3) = W(3) + dW(3);
end
Training date
- In supervised learning, each training dataset should consist of input and correct output pairs.
{ input, correct output }
clear all
X = [ 0 0 1;
0 1 1;
1 0 1;
1 1 1;
];
D = [ 0
0
1
1
];
E1 = zeros(1000, 1);
E2 = zeros(1000, 1);
W1 = 2*rand(1, 3) - 1; %W1=[a,b,c]
W2 = W1;
for epoch = 1:1000 % train
W1 = DeltaSGD(W1, X, D);
W2 = DeltaBatch(W2, X, D);
es1 = 0;
es2 = 0;
N = 4;
for k = 1:N
x = X(k, :)';
d = D(k);
v1 = W1*x;
y1 = Sigmoid(v1);
es1 = es1 + (d - y1)^2;
v2 = W2*x;
y2 = Sigmoid(v2);
es2 = es2 + (d - y2)^2;
end
E1(epoch) = es1 / N;
E2(epoch) = es2 / N;
end
plot(E1, 'r')
hold on
plot(E2, 'b:')
xlabel('Epoch')
ylabel('Average of Training error')
legend('SGD', 'Batch')
This program trains the neural network 1,000 times for each function, DeltaSGD and DeltaBatch. At each epoch, it inputs the training data into the neural network and calculates the mean square error (E1, E2) of the output. Once the program completes 1,000 trainings, it generates a graph that shows the mean error at each epoch. As the figure shows, the SGD yields faster reduction of the learning error than the batch; the SGD learns faster.
Test date
clear all
X = [ 0 0 1;
0 1 1;
1 0 1;
1 1 1;
];
D = [ 0
0
1
1
];
W = 2*rand(1, 3) - 1;
for epoch = 1:10000 % train
W = DeltaSGD(W, X, D);
end
N = 4; % inference
for k = 1:N
x = X(k, :)';
v = W*x;
y = Sigmoid(v)
end