损失函数(cost function)
经过上一节课的学习,我们已经对神经网络有了一定的概念。而评价一个神经网络的性能,往往使用损失函数来评价。
在有监督的学习中,每一次的训练,我们都已知目标输出,通常将神经网络真实的输出与已知目标输出的距离作为损失函数,两者相差越小,则认为神经网络性能越好,若两者相差很大,则认为神经网络性能不好。
设目标输出为
yL
,神经网络的输出为
aL
(两者都是n*1的列向量)
令e =
yL−aL
cost function : J =
12
∑nLj=1e2j
cost function并不惟一
常用的还有:J =
1m
∑mi=1L(yL,aL)
, 其中
L(yL,aL)=aLlogyL+(1−aL)log(1−yL)
一个网络有好的性能意味着找到了最适合的权值( w1,w2,...,wL−1 ),使损失函数J最小。这个过程就是神经网络学习的过程,可使用最速梯度法查找。
最速梯度法(Steepest Gradient Method)
(血崩。。。写到一半手贱去点开了另一篇博客的编辑,之前写好的这部分没保存,打开新的这里就直接没了,好气啊)
在之前的作业中,连接权w是直接给定了的,实际应用中也有根据经验来确定w的情况,但更多的是让神经网络自己去学习,直到找到最优的w为止。
梯度下降法让这个查找的过程始终沿着导数 ∂J∂w ,即w变化最快的方向去接近最优值。并设置学习率 α ,表示每一步变化的程度。
公式: wlji <—— wlji - α ∂J∂w
举个简单的例子:(需要注意的是,实际情况中J和w的关系往往更复杂)
(这个图超级丑的,不过一看就是原创啊!哈哈哈)
图中的C点是我们要找的最优解,使用最速梯度法,令
α
= 1
当初始点在A点时,
∂J∂w
为负(A点的斜率),负负得正,根据公式,
wlji
应加上一个值,即A点往右边移动,直到到达C点或C点附近为止(与
α
的选取有关)。
当初始点在B点时,
∂J∂w
为正(B点的斜率),
wlji
应减去一个值,即B点往左边移动,直到到达C点或C点附近为止。
Back Propagation 反向传播算法
- 前向计算
zl+1=wlal (1)
al+1=f(zl+1) (2)
具体说明请参照上一篇博客 - 计算cost
J = 12 (aL−yL)2 (3)
由第1步一直算到神经网络的最后一层,直到获得 aL 。使用该公式来衡量神经网络的输出值与目标输出的差距,该公式不惟一。 - 反向计算
需计算 ∂J∂w ,按公式: wlji <—— wlji - α ∂J∂w 更新参数,以保证调节参数的过程始终沿着w变化最快的方向。
根据导数链式法则 (3)–>(2)–>(1)
有 ∂J∂wL = ∂J∂aL · ∂aL∂zL · ∂zL∂wL = ( aL−yL ) · f`( zL ) · aL 。 (4)
细心的你可能发现了,上面的公式计算的是最后一层的 ∂J∂wL ,接下来我们引入一个新的变量来帮助我们计算每一层的 ∂J∂wl 。
设 δL = ( aL−yL)·f‘(zL ) (即(4)式等号最右边但不包括 aL )
δl+1 和 δl 具有如下关系:
综上所述
∂J∂wl = δl+1 · al 。 (5)
觉得上面的推导过程较复杂的可直接忽略,只需记住(1)(2)(3)(5)的公式,并可编程实现即可。
作业:下载地址
Instructions
Task 0: implement feedforward and backward computation
- in
fc.m
, implement the forward computing (in either component or vector form), return both the activation and the net input - in
bc.m
, implement the backward computing (in either component or vector form)
Task 1: implement online BP algorithm
in bp_online.m
:
1. calculate activations a1
, a2
, a3
, and net input z2
, z3
2. calculate cost function J
3. calculate sensitivity delta3
, delta2
4. calculate gradient with respect to weights dw1
, dw2
5. update weights w1
, w2
Task 2: implement batch BP algorithm
in bp_batch.m
:
1. calculate activations a1
, a2
, a3
, and net input z2
, z3
2. calculate cost function J
3. calculate sensitivity delta3
, delta2
4. cumulate gradient with respect to weights dw1
, dw2
5. update weights w1
, w2
fc.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Course: Understanding Deep Neural Networks
%
% Lab 3 - BP algorithms
%
% Task 0: implement feedforward and backward computation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [a_next, z_next] = fc(w, a)
% define the activation function
f = @(s) 1 ./ (1 + exp(-s));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code BELOW
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% forward computing (in either component or vector form)
a = [a; 1];
z_next = w * a;
a_next = f(z_next);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code ABOVE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
bc.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Course: Understanding Deep Neural Networks
%
% Lab 3 - BP algorithms
%
% Task 0: implement feedforward and backward computation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function delta = bc(w, z, delta_next)
% define the activation function
f = @(s) 1 ./ (1 + exp(-s));
% define the derivative of activation function
df = @(s) f(s) .* (1 - f(s));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code BELOW
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% backward computing (in either component or vector form)
delta = df(z) * (sum(w * delta_next));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code ABOVE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
bp_online.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Course: Understanding Deep Neural Networks
%
% Lab 3 - BP algorithms
%
% Task 1: implement online BP algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% clear the workspace
clear
% define the activation function
f = @(s) 1 ./ (1 + exp(-s));
% define the derivative of activation function
df = @(s) f(s) .* (1 - f(s));
% prepare the training data set
data = [1 0 0 1
0 1 0 1]; % samples
labels = [1 1 0 0]; % labels
m = size(data, 2);
% choose parameters, initialize the weights
alpha = 0.15;
epochs = 50000;
w1 = randn(2,3);
w2 = randn(1,3);
J = zeros(1,epochs);
% loop until weights converge
for t = 1:epochs
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code BELOW
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% for each samples
for i = 1:m
% forward calculation (invoke fc)
a1 = data(:, i);
[a2, z2] = fc(w1, a1);
[a3, z3] = fc(w2, a2);
% calculate cost function
J(t) = 0.5 * (a3 - labels(i)) * (a3 - labels(i));
% backwork calculation (invoke bc)
delta3 = (a3 - labels(i)) * df(z3);
delta2 = bc(w2, z2, delta3);
% calculate the gradients
dw1 = delta2 * ([a1;1])';
dw2 = delta3 * ([a2;1])';
% update weights
w1 = w1 - alpha * dw1;
w2 = w2 - alpha * dw2;
% end for each sample
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code ABOVE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% end loop
if mod(t,100) == 0
fprintf('%i/%i epochs: J=%.4f\n', t, epochs, J(t));
end
end
% display the result
for i = 1:4
a1 = data(:,i);
[a2, z2] = fc(w1, a1);
[a3, z3] = fc(w2, a2);
fprintf('Sample [%i %i] (%i) is classified as %i.\n', data(1,i), data(2,i), labels(i), a3>0.5);
end
bp_batch.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Course: Understanding Deep Neural Networks
%
% Lab 3 - BP algorithms
%
% Task 2: implement batch BP algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% clear the workspace
clear
% define the activation function
f = @(s) 1 ./ (1 + exp(-s));
% define the derivative of activation function
df = @(s) f(s) .* (1 - f(s));
% prepare the training data set
data = [1 0 0 1
0 1 0 1]; % samples
labels = [1 1 0 0]; % labels
m = size(data, 2);
% choose parameters, initialize the weights
alpha = 0.15;
epochs = 50000;
w1 = randn(2,3);
w2 = randn(1,3);
J = zeros(1,epochs);
% loop until weights converge
for t = 1:epochs
% reset the total gradients
dw1 = 0;
dw2 = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code BELOW
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% for all samples
for i = 1:m
% forward calculation (invoke fc)
a1 = data(:, i);
[a2, z2] = fc(w1, a1);
[a3, z3] = fc(w2, a2);
% calculate cost function
J(t) = 0.25 * 0.5 * dot((a3 - labels(i)), (a3 - labels(i)));
% backwork calculation (invoke bc)
delta3 = (a3 - labels(i)) * df(z3);
delta2 = bc(w2, z2, delta3);
% cumulate the total gradients
dw1 = dw1 + delta2 * ([a1;1])';
dw2 = dw2 + delta3 * ([a2;1])';
% end for all samples
end
% update weights
w1 = w1 - alpha * dw1;
w2 = w2 - alpha * dw2;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your code ABOVE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% end loop
if mod(t,100) == 0
fprintf('%i/%i epochs: J=%.4f\n', t, epochs, J(t));
end
end
% display the result
for i = 1:4
a1 = data(:,i);
[a2, z2] = fc(w1, [a1]);
[a3, z3] = fc(w2, [a2]);
fprintf('Sample [%i %i] (%i) is classified as %i.\n', data(1,i), data(2,i), labels(i), a3>0.5);
end