chapter 6 deep feed forward networks
chapter 6.1 example learning xor
J(Θ)=14∑x∈x(f∗(x)−f(x;Θ))2
now we must chose our model
f(x;Θ)
,
linear model
f(x;w,b)=xTw+b
it can not describe the xor logic
add a hidden linear layer
h=f1(x;W,c)
,
y=f2(h;w,b)
f(x;W,c,w,b)=f2(f1(x))
f1(x)=WTx
and
f2(h)=hTw
we get
f(x)=wTWTx
.
clearly, we must use a nonlinear layer to represent the features.
ReLU
rectified linear unit
f(x;W,c,w,b)=wTmax{0,WTx+c}+b
W=[1111]
c=[0−1]
w=[1−2]
calculate
x=⎡⎣⎢⎢⎢00110101⎤⎦⎥⎥⎥
xW=⎡⎣⎢⎢⎢01120112⎤⎦⎥⎥⎥
xW+c=⎡⎣⎢⎢⎢0112−1001⎤⎦⎥⎥⎥
ReLU
⎡⎣⎢⎢⎢01120001⎤⎦⎥⎥⎥
wTh=⎡⎣⎢⎢⎢0110⎤⎦⎥⎥⎥
get it