Single Sample
Symbols
X=⎛⎝⎜⎜x1⋮xnx⎞⎠⎟⎟,Y=⎛⎝⎜⎜y1⋮yny⎞⎠⎟⎟,
X
=
(
x
1
⋮
x
n
x
)
,
Y
=
(
y
1
⋮
y
n
y
)
,
Z[l]=⎛⎝⎜⎜⎜z[l]1⋮z[l]nl⎞⎠⎟⎟⎟,1≤l≤L
Z
[
l
]
=
(
z
1
[
l
]
⋮
z
n
l
[
l
]
)
,
1
≤
l
≤
L
A[l]=⎛⎝⎜⎜⎜a[l]1⋮a[l]nl⎞⎠⎟⎟⎟,A~[l]=⎛⎝⎜⎜⎜⎜⎜⎜a[l]0a[l]1⋮a[l]nl⎞⎠⎟⎟⎟⎟⎟⎟=(1A[l]),0≤l≤L
A
[
l
]
=
(
a
1
[
l
]
⋮
a
n
l
[
l
]
)
,
A
~
[
l
]
=
(
a
0
[
l
]
a
1
[
l
]
⋮
a
n
l
[
l
]
)
=
(
1
A
[
l
]
)
,
0
≤
l
≤
L
W[l]=(w[l]ij)nl×nl−1,w′[l]=⎛⎝⎜⎜⎜w[l]1,0⋮w[l]nl,0⎞⎠⎟⎟⎟,W~[l]=(w′[l]W[l]),1≤l<L
W
[
l
]
=
(
w
i
j
[
l
]
)
n
l
×
n
l
−
1
,
w
′
[
l
]
=
(
w
1
,
0
[
l
]
⋮
w
n
l
,
0
[
l
]
)
,
W
~
[
l
]
=
(
w
′
[
l
]
W
[
l
]
)
,
1
≤
l
<
L
Neural Network Architecture
X=A[0]→Z[1]→A[1]→⋯→Z[L]→A[L]=Y^ X = A [ 0 ] → Z [ 1 ] → A [ 1 ] → ⋯ → Z [ L ] → A [ L ] = Y ^
Loss Function
z[l]i=∑j=0nl−1w[l]ija~[l−1]j,1≤i≤nl,1≤l≤L
z
i
[
l
]
=
∑
j
=
0
n
l
−
1
w
i
j
[
l
]
a
~
j
[
l
−
1
]
,
1
≤
i
≤
n
l
,
1
≤
l
≤
L
即
Zl=W[l]A~[l−1],1≤l≤L
Z
l
=
W
[
l
]
A
~
[
l
−
1
]
,
1
≤
l
≤
L
a[l]i=g(z[l]i),1≤i≤nl,1≤l≤L
a
i
[
l
]
=
g
(
z
i
[
l
]
)
,
1
≤
i
≤
n
l
,
1
≤
l
≤
L
即
A[l]=g(Z[l]),1≤l≤L
A
[
l
]
=
g
(
Z
[
l
]
)
,
1
≤
l
≤
L
loss(X,Y)=−∑i=1ny[yilny^i+(1−yi)ln(1−y^i)]
loss
(
X
,
Y
)
=
−
∑
i
=
1
n
y
[
y
i
ln
y
^
i
+
(
1
−
y
i
)
ln
(
1
−
y
^
i
)
]
公式
∂∂z[L]iloss(X,Y)=dy^idz[L]i⋅∂∂y^iloss(X,Y)
∂
∂
z
i
[
L
]
loss
(
X
,
Y
)
=
d
y
^
i
d
z
i
[
L
]
⋅
∂
∂
y
^
i
loss
(
X
,
Y
)
=−g′(z[L])[yi⋅1y^i−(1−yi)⋅11−y^i]
=
−
g
′
(
z
[
L
]
)
[
y
i
⋅
1
y
^
i
−
(
1
−
y
i
)
⋅
1
1
−
y
^
i
]
=−y^i(1−y^i)[yi⋅1y^i−(1−yi)⋅11−y^i]
=
−
y
^
i
(
1
−
y
^
i
)
[
y
i
⋅
1
y
^
i
−
(
1
−
y
i
)
⋅
1
1
−
y
^
i
]
=(1−yi)y^i−yi(1−y^i)
=
(
1
−
y
i
)
y
^
i
−
y
i
(
1
−
y
^
i
)
=y^i−yi,1≤i≤nL
=
y
^
i
−
y
i
,
1
≤
i
≤
n
L
∂∂z[l]jloss(X,Y)=∑i=1nl+1∂z[l+1]i∂z[l]j⋅∂∂z[l+1]iloss(X,Y)
∂
∂
z
j
[
l
]
loss
(
X
,
Y
)
=
∑
i
=
1
n
l
+
1
∂
z
i
[
l
+
1
]
∂
z
j
[
l
]
⋅
∂
∂
z
i
[
l
+
1
]
loss
(
X
,
Y
)
=∑i=1nl+1g′(z[l]j)w[l]ij⋅∂∂z[l+1]iloss(X,Y)
=
∑
i
=
1
n
l
+
1
g
′
(
z
j
[
l
]
)
w
i
j
[
l
]
⋅
∂
∂
z
i
[
l
+
1
]
loss
(
X
,
Y
)
=g′(z[l]j)∑i=1nl+1w[l]ij⋅∂∂z[l+1]iloss(X,Y),1≤j≤sl,1≤l<L
=
g
′
(
z
j
[
l
]
)
∑
i
=
1
n
l
+
1
w
i
j
[
l
]
⋅
∂
∂
z
i
[
l
+
1
]
loss
(
X
,
Y
)
,
1
≤
j
≤
s
l
,
1
≤
l
<
L
因此
∂∂Z[l]loss(X,Y)=⎧⎩⎨⎪⎪A[L]−Y,l=Lg′(Z[l]) .∗ ((W[l+1])⊺∂∂Z[l+1]loss(X,Y)),1≤l<L
∂
∂
Z
[
l
]
loss
(
X
,
Y
)
=
{
A
[
L
]
−
Y
,
l
=
L
g
′
(
Z
[
l
]
)
.
∗
(
(
W
[
l
+
1
]
)
⊺
∂
∂
Z
[
l
+
1
]
loss
(
X
,
Y
)
)
,
1
≤
l
<
L
where .* is element-wise product.
∂∂w[l]ijloss(X,Y)=∂∂z[l]iloss(X,Y)⋅a~[l−1]j,1≤i≤sl+1,0≤j≤sl,1≤l≤L
∂
∂
w
i
j
[
l
]
loss
(
X
,
Y
)
=
∂
∂
z
i
[
l
]
loss
(
X
,
Y
)
⋅
a
~
j
[
l
−
1
]
,
1
≤
i
≤
s
l
+
1
,
0
≤
j
≤
s
l
,
1
≤
l
≤
L
因此
∂∂W~[l]loss(X,Y)=∂∂Z[l]loss(X,Y)⋅A~[l−1]⊺,1≤l≤L
∂
∂
W
~
[
l
]
loss
(
X
,
Y
)
=
∂
∂
Z
[
l
]
loss
(
X
,
Y
)
⋅
A
~
[
l
−
1
]
⊺
,
1
≤
l
≤
L
Multiple Samples
Symbols
X=(X(1),⋯,X(m)),
X
=
(
X
(
1
)
,
⋯
,
X
(
m
)
)
,
Y=(Y(1),⋯,Y(m)),
Y
=
(
Y
(
1
)
,
⋯
,
Y
(
m
)
)
,
Z[l]=(Z[l](1),⋯,Z[l](m)),1≤l≤L
Z
[
l
]
=
(
Z
[
l
]
(
1
)
,
⋯
,
Z
[
l
]
(
m
)
)
,
1
≤
l
≤
L
A[l]=(A[l](1),⋯,A[l](m)),0≤l≤L
A
[
l
]
=
(
A
[
l
]
(
1
)
,
⋯
,
A
[
l
]
(
m
)
)
,
0
≤
l
≤
L
A~[l]=(A~[l](1),⋯,A~[l](m)),0≤l≤L
A
~
[
l
]
=
(
A
~
[
l
]
(
1
)
,
⋯
,
A
~
[
l
]
(
m
)
)
,
0
≤
l
≤
L
∂Z[l]=(∂∂Z[l]loss(X(1),Y(1)),⋯,∂∂Z[l]loss(X(m),Y(m)))nl×m,1≤l≤L
∂
Z
[
l
]
=
(
∂
∂
Z
[
l
]
loss
(
X
(
1
)
,
Y
(
1
)
)
,
⋯
,
∂
∂
Z
[
l
]
loss
(
X
(
m
)
,
Y
(
m
)
)
)
n
l
×
m
,
1
≤
l
≤
L
Cost Function
cost(X,Y)=1m∑i=1mloss(X(i),Y(i)) cost ( X , Y ) = 1 m ∑ i = 1 m loss ( X ( i ) , Y ( i ) )
公式
Z[l]=W[l]A~[l−1],1≤l<L
Z
[
l
]
=
W
[
l
]
A
~
[
l
−
1
]
,
1
≤
l
<
L
A[l]=g(Z[l]),1≤l≤L
A
[
l
]
=
g
(
Z
[
l
]
)
,
1
≤
l
≤
L
g′(Z[l])=A[l] .∗ (1nl×m−A[l]),1≤l≤L
g
′
(
Z
[
l
]
)
=
A
[
l
]
.
∗
(
1
n
l
×
m
−
A
[
l
]
)
,
1
≤
l
≤
L
∂Z[l]={A[L]−Y,l=Lg′(Z[l]) .∗ ((W[l+1])⊺⋅∂Z[l+1]),1≤l<L
∂
Z
[
l
]
=
{
A
[
L
]
−
Y
,
l
=
L
g
′
(
Z
[
l
]
)
.
∗
(
(
W
[
l
+
1
]
)
⊺
⋅
∂
Z
[
l
+
1
]
)
,
1
≤
l
<
L
∂∂W~[l]cost(X,Y)=1m∂Z[l]⋅A~[l−1]⊺,1≤l≤L
∂
∂
W
~
[
l
]
cost
(
X
,
Y
)
=
1
m
∂
Z
[
l
]
⋅
A
~
[
l
−
1
]
⊺
,
1
≤
l
≤
L