前言
本文为8月20日计算机视觉理论学习笔记——神经网络与BP算法,分为三个章节:
- Delta 学习规则;
- 梯度下降;
- Numpy 实现反向传播。
一、Delta 学习规则
有监督学习算法,根据神经元的实际输出与期望输出差别来调整连接权:
△ w i j = a ⋅ ( d i − y i ) x j ( t ) \bigtriangleup w_{ij} = a\cdot (d_i - y_i)x_j(t) △wij=a⋅(di−yi)xj(t)
其中: △ w i j \bigtriangleup w_{ij} △wij 为权重增量, d i d_i di 是神经元 i i i 的期望输出, y i y_i yi 是神经元 i i i 的实际输出, a a a 是学习速度。
- 目标函数:
J ( w ) = 1 2 ∣ ∣ t − z ∣ ∣ 2 = 1 2 ∑ k = 1 c ( t k − z k ) 2 J(w) = \frac{1}{2} ||\textbf {t} - \textbf{z}||^2 = \frac{1}{2} \sum_{k=1}^{c}(t_k - z_k)^2 J(w)=21∣∣t−z∣∣2=21k=1∑c(tk−zk)2
二、梯度下降
w ( m + 1 ) = w ( m ) + △ w ( m ) = w ( m ) − η ∂ J ∂ w w(m+1) = w(m) + \bigtriangleup w(m) = w(m) - \eta \frac{\partial J}{\partial w} w(m+1)=w(m)+△w(m)=w(m)−η∂w∂J
1、输出层权重改变量
J
(
w
)
=
1
2
∑
k
=
1
c
(
t
k
−
z
k
)
2
∂
J
∂
w
k
j
=
∂
J
∂
n
e
t
k
∂
n
e
t
k
∂
w
k
j
J(w) = \frac{1}{2} \sum_{k=1}^{c}(t_k - z_k)^2\\ \frac{\partial J}{\partial w_{kj}} = \frac{\partial J}{\partial net_k} \frac{\partial net_k}{\partial w_{kj}}
J(w)=21k=1∑c(tk−zk)2∂wkj∂J=∂netk∂J∂wkj∂netk
其中,输出单元的总输入
n
e
t
k
=
∑
i
=
1
n
H
w
k
i
y
i
net_k = \sum_{i=1}^{n_H} w_{ki}y_i
netk=∑i=1nHwkiyi,
∂
n
e
t
k
∂
w
k
j
=
y
j
\frac{\partial net_k}{\partial w_{kj}} = y_j
∂wkj∂netk=yj。
∂ J ∂ n e t k = ∂ J ∂ z k ∂ z k ∂ n e t k = − ( t k − z k ) f ′ ( n e t k ) \frac{\partial J}{\partial net_k} = \frac{\partial J}{\partial z_k} \frac{\partial z_k}{\partial net_k} = -(t_k - z_k)f'(net_k) ∂netk∂J=∂zk∂J∂netk∂zk=−(tk−zk)f′(netk)
令
δ
k
=
(
t
k
−
z
k
)
f
′
(
n
e
t
k
)
\delta_k = (t_k - z_k)f'(net_k)
δk=(tk−zk)f′(netk),则:
∂
J
∂
n
e
t
k
=
−
(
t
k
−
z
k
)
f
′
(
n
e
t
k
)
y
j
=
−
δ
k
y
j
\frac{\partial J}{\partial net_k} = -(t_k - z_k)f'(net_k)y_j = -\delta _ky_j
∂netk∂J=−(tk−zk)f′(netk)yj=−δkyj
2、隐藏层权重该变量
∂
J
∂
w
j
i
=
∂
J
∂
y
j
∂
y
j
∂
n
e
t
j
∂
n
e
t
j
∂
w
j
i
\frac{\partial J}{\partial w_{ji}} = \frac{\partial J}{\partial y_j} \frac{\partial y_j}{\partial net_j} \frac{\partial net_j}{\partial w_{ji}}
∂wji∂J=∂yj∂J∂netj∂yj∂wji∂netj
又,
n
e
t
j
=
∑
m
=
1
d
w
j
m
x
m
net_j = \sum_{m=1}^{d} w_{jm} x_m
netj=m=1∑dwjmxm
则:
∂
y
j
∂
n
e
t
j
=
f
′
(
n
e
t
j
)
∂
n
e
t
j
∂
w
j
i
=
x
i
∂
J
∂
y
j
=
∂
∂
y
j
[
1
2
∑
k
=
1
c
(
t
k
−
z
k
)
2
]
=
−
∑
k
=
1
c
(
t
k
−
z
k
)
f
′
(
n
e
t
k
)
w
k
j
∂
J
∂
w
j
i
=
−
[
∑
k
=
1
c
(
t
k
−
z
k
)
f
′
(
n
e
t
k
)
w
k
j
]
f
′
(
n
e
t
j
)
x
i
\frac{\partial y_j}{\partial net_j} = f'(net_j)\\ \frac{\partial net_j}{\partial w_{ji}} = x_i\\ \frac{\partial J}{\partial y_j} = \frac{\partial }{\partial y_j} [\frac{1}{2} \sum_{k=1}^{c} (t_k - z_k)^2 ] = -\sum_{k=1}^{c} (t_k - z_k) f'(net_k) w_{kj}\\ \frac{\partial J}{\partial w_{ji}} = -[\sum_{k=1}^{c} (t_k - z_k) f'(net_k) w_{kj}] f'(net_j) x_i
∂netj∂yj=f′(netj)∂wji∂netj=xi∂yj∂J=∂yj∂[21k=1∑c(tk−zk)2]=−k=1∑c(tk−zk)f′(netk)wkj∂wji∂J=−[k=1∑c(tk−zk)f′(netk)wkj]f′(netj)xi
令
δ
j
=
f
′
(
n
e
t
j
)
∑
k
=
1
c
δ
k
w
k
j
\delta_j = f'(net_j) \sum_{k=1}^{c} \delta_k w_{kj}
δj=f′(netj)∑k=1cδkwkj,则:
∂
J
∂
w
j
i
=
−
δ
j
x
i
\frac{\partial J}{\partial w_{ji}} = -\delta_j x_i
∂wji∂J=−δjxi
总结如下:
- 权重增量 = -1 × 学习步长 × 目标函数对权重的偏导数;
- 目标函数对权重的偏导数 = -1 × 残差 × 当前层的输入;
- 残差 = 当前层激励函数的导数 × 上层反传来的误差;
- 上层反传来的误差 = 上层残差的加权和。
代码如下:
tf.set_random_seed(777)
learning_rate = 0.1
x_data = [[0, 0],
[0, 1],
[1, 0],
[1, 1]]
y_data = [[0],
[1],
[1],
[0]]
x_data = np.array(x_data, dtype=np.float32)
y_data = np.array(y_data, dtype=np.float32)
X = tf.placeholder(tf.float32, [None, 2])
Y = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.random_normal([2, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
# 期望值
hypothesis = tf.sigmoid(tf.matmul(X, W) + b)
# 损失函数
loss = -tf.reduce_mean(Y * tf.log(hypothesis) + (1 - Y) *
tf.log(1 - hypothesis))
# 训练
train = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)
# Accuracy computation
# True if hypothesis > 0.5 else False
pred = tf.cast(hypothesis>0.5, dtype=tf.float32)
acc = tf.reduce_mean(tf.cast(tf.equal(pred, Y), dtype=tf.float32))
# Launch graph
with tf.Session() as sess:
# 初始化变量
sess.run(tf.global_variables_initializer())
for step in range(10001):
sess.run(train, feed_dict={X: x_data, Y: y_data})
if step % 100 == 0:
print(step, sess.run(loss, feed_dict={
X: x_data, Y: y_data}), sess.run(W))
# 展示准确率
h, c, a = sess.run([hypothesis, pred, acc],
feed_dict={X: x_data, Y: y_data})
print("\nHypothesis: ", h, "\nCorrect: ", c, "\nAccuracy: ", a)
>>> Hypothesis: [[0.5]
[0.5]
[0.5]
[0.5]]
Correct: [[0.]
[0.]
[0.]
[0.]]
Accuracy: 0.5
3、随机梯度下降(SGD)
用部分样本迭代。
三、Numpy 实现反向传播
# 定义双曲函数和它们的导数
def tanh(x):
return np.tanh(x)
def tanh_deriv(x):
return 1. - np.tanh(x)**2
def logistic(x):
return 1 / (1 + np.exp(-x))
def logistic_derivative(x):
return logistic(x) * (1 - logistic(x))
# 定义神经网络
class NeuralNetwork:
#初始化,layes表示的是一个list,eg[10,10,3]表示第一层10个神经元,第二层10个神经元,第三层3个神经元
def __init__(self, layers, activation='tanh'):
'''
layers: 列表,至少有 2个值;
activation: 'tanh' or 'logistic'
'''
if activation == 'logistic':
self.activation = logistic
self.activation_deriv = logistic_derivative
elif activation == 'tanh':
self.activation = tanh
self.activation_deriv = tanh_deriv
self.weights = []
#循环从1开始,相当于以第二层为基准,进行权重的初始化
for i in range(1, len(layers) - 1):
# 对当前神经节点的前驱赋值
self.weights.append((2*np.random.random((layers[i - 1] + 1, layers[i] + 1))-1)*0.25)
# 对当前神经节点的后继赋值
self.weights.append((2*np.random.random((layers[i] + 1, layers[i + 1]))-1)*0.25)
#训练函数 ,X矩阵,每行是一个实例 ,y是每个实例对应的结果
def fit(self, X, y, learning_rate=0.1, epochs=100):
X = np.atleast_2d(X) # 确定 X 至少是二维数据
temp = np.ones([X.shape[0], X.shape[1] + 1]) # 初始化矩阵
temp[:, 0:-1] = X
X = temp
y = np.array(y)
for k in range(epochs):
#随机选取一行,对神经网络进行更新
i = np.random.randint(X.shpae[0])
a = [X[i]]
# 完成所有正向的更新
for l in range(len(self.weights)):
a.append(self.activation(np.dot(a[l], self.weights[l])))
error = y[i] - a[-1]
deltas = [error * self.activation_deriv(a[-1])]
if k%1000 == 0:
print(k,'...',error*error*100)
# 反向计算误差,更新权重
for l in range(len(a)-2, 0, -1): # 从倒数第二层开始
deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_deriv(a[l]))
deltas.reverse()
for i in range(len(self.weights)):
layer = np.atleast_2d(a[i])
delta = np.atleast_2d(deltas[i])
self.weights[i] = learning_rate * layer.T.dot(delta)
# 预测函数
def predict(self, x):
x = np.array(x)
temp = np.ones(x.shape[0] + 1)
temp[0:-1] = x
a = temp
for l in range(0, len(self.weights)):
a = self.activation(np.self.weights[l])
return a
nn = NeuralNetwork([2,2,1], 'tanh')
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])
nn.fit(X, y)
for i in [[0, 0], [0, 1], [1, 0], [1,1]]:
print(i,nn.predict(i))