原理
流程
- 生成数据
- 生成权重
- layer1:
- layer2:
x
{x}
x:输入数据
(
20
,
5
)
{(20, 5)}
(20,5)
w
1
{w_{1}}
w1:第一层权重
(
5
,
3
)
{(5, 3)}
(5,3)
w
2
{w_{2}}
w2:第二层权重
(
3
,
2
)
{(3, 2)}
(3,2)
a
1
{a_{1}}
a1:乘积
(
20
,
3
)
{(20, 3)}
(20,3)
h
1
{h_{1}}
h1:过激活函数
(
20
,
3
)
{(20, 3)}
(20,3)
a
2
{a_{2}}
a2:乘积
(
20
,
2
)
{(20, 2)}
(20,2)
h
2
{h_{2}}
h2:过激活函数
(
20
,
2
)
{(20, 2)}
(20,2)
正向传播
x
{x}
x
a
1
=
x
∗
w
1
{a_{1}}={x}*{w_{1}}
a1=x∗w1
h
1
=
s
i
g
m
o
i
d
(
a
1
)
{h_{1}}=sigmoid({a_{1}})
h1=sigmoid(a1)
a
2
=
h
1
∗
w
2
{a_{2}}={h_{1}}*{w_{2}}
a2=h1∗w2
h
2
=
s
i
g
m
o
i
d
(
a
2
)
{h_{2}}=sigmoid({a_{2}})
h2=sigmoid(a2)
推导
损失函数logloss: J = − 1 m ∑ ( y log y ^ + ( 1 − y ) log ( 1 − y ^ ) ) \displaystyle J=-\frac{1}{m}\sum(y\log{\hat{y}}+(1-y)\log(1-\hat{y})) J=−m1∑(ylogy^+(1−y)log(1−y^))
∂ J ∂ w 2 = ∂ J ∂ h 2 ∗ ∂ h 2 ∂ a 2 ∗ ∂ a 2 ∂ w 2 \displaystyle\frac{\partial{J}}{\partial{w_2}}=\frac{\partial{J}}{\partial{h_2}}*\frac{\partial{h_{2}}}{\partial{a_2}}*\frac{\partial{a_{2}}}{\partial{w_{2}}} ∂w2∂J=∂h2∂J∗∂a2∂h2∗∂w2∂a2
∂ J ∂ w 1 = ∂ J ∂ h 2 ∗ ∂ h 2 ∂ a 2 ∗ ∂ a 2 ∂ h 1 ∗ ∂ h 1 ∂ a 1 ∗ ∂ a 1 ∂ w 1 \displaystyle\frac{\partial{J}}{\partial{w_1}}=\frac{\partial{J}}{\partial{h_2}}*\frac{\partial{h_{2}}}{\partial{a_2}}*\frac{\partial{a_{2}}}{\partial{h_{1}}}*\frac{\partial{h_{1}}}{\partial{a_{1}}}*\frac{\partial{a_{1}}}{\partial{w_{1}}} ∂w1∂J=∂h2∂J∗∂a2∂h2∗∂h1∂a2∗∂a1∂h1∗∂w1∂a1
其中公共部分(前两个偏导)为: ∂ J ∂ h 2 ∗ ∂ h 2 ∂ a 2 \displaystyle\frac{\partial{J}}{\partial{h_2}}*\frac{\partial{h_{2}}}{\partial{a_2}} ∂h2∂J∗∂a2∂h2
∂ J ∂ h 2 = − 1 m ∗ y − h 2 h 2 ( 1 − h 2 ) \displaystyle\frac{\partial{J}}{\partial{h_2}}=-\frac{1}{m}*\frac{y-h_{2}}{h_{2}(1-h_{2})} ∂h2∂J=−m1∗h2(1−h2)y−h2
∂ h 2 ∂ a 2 = h 2 ( 1 − h 2 ) \displaystyle\frac{\partial{h_{2}}}{\partial{a_2}}=h_{2}(1-h_{2}) ∂a2∂h2=h2(1−h2)
∂ a 2 ∂ w 2 = h 1 \displaystyle\frac{\partial{a_{2}}}{\partial{w_{2}}}=h_{1} ∂w2∂a2=h1
∂ a 2 ∂ h 1 = w 2 \displaystyle\frac{\partial{a_{2}}}{\partial{h_{1}}}=w_{2} ∂h1∂a2=w2
∂ h 1 ∂ a 1 = h 1 ∗ ( 1 − h 1 ) \displaystyle\frac{\partial{h_{1}}}{\partial{a_{1}}}=h_{1}*(1-h_{1}) ∂a1∂h1=h1∗(1−h1)
∂ a 1 ∂ w 1 = x \displaystyle\frac{\partial{a_{1}}}{\partial{w_{1}}}=x ∂w1∂a1=x
- 求 x 1 {x_{1}} x1
代码
用numpy实现
import numpy as np
train_x_dim = 5
sample_1_num = 10
sample_0_num = 10
weight1_dim = 3
weight2_dim = 2
train_x_1 = np.random.rand(sample_1_num, train_x_dim)
train_x_0 = np.random.rand(sample_0_num, train_x_dim)*10
train_y_1 = np.ones(sample_1_num)
train_y_0 = np.zeros(sample_0_num)
weight1 = np.random.rand(train_x_dim, weight1_dim)
def sigmoid(x):
return 1/(1+np.exp(-x))
a1 = np.dot(train_x_1, weight1)
h1 = sigmoid(a1)
weight2 = np.random.rand(weight1_dim, weight2_dim)
a2 = np.dot(h1, weight2)
h2 = sigmoid(a2)
def sigmoid_derv(x):
return sigmoid(x)*(1-sigmoid(x))
用tf实现
from tensorflow import keras
# load data
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# build model
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
# compile model
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# train model
model.fit(train_images, train_labels, epochs=5)
# evaluate
test_loss, test_acc = model.evaluate(test_images, test_labels)