学习的教材是韩国Phil Kim教授的《深度学习:基于MATLAB的设计实践》,这本书讲得蛮好的,有基础知识也有matlab的代码可以实践。因此,我主要是在学习理解的基础上,用python代码重写,实现该过程。
0.准备工作
import numpy as np
def sigmoid(x):
y = 1 / (1 + np.exp(-x))
return y
sigmoid函数,基本上是必会必用的了,我习惯先将其在一个Sigmoid.py的文件里单独定义,然后别的函数需要用的时候调用一下,这样就不用一遍遍地写了。
1.随机梯度下降算法(SGD)
取其中一个训练点并计算输出y;计算出输出y和正确输出d之间的差;根据增量规则计算权重更新dW,调整神经网络的权重W;重复以上步骤N(训练数据的个数)次。
具体代码如下:
import numpy as np
from Sigmoid import sigmoid
def delta_sgd(W, X, D):
alpha = 0.9
N = 4
for k in range(N):
x = X[k, :].reshape(-1, 1) # 将行向量转换为列向量
d = D[k]
v = np.dot(W, x)
y = sigmoid(v)
e = d - y
delta = y * (1 - y) * e
dW = alpha * delta * x
W[0] += dW[0]
W[1] += dW[1]
W[2] += dW[2]
return W
用以下代码检验上述算法是否正确:
import numpy as np
from Sigmoid import sigmoid
# 定义DeltaSGD函数,用于随机梯度下降
def delta_sgd(W, X, D, alpha=0.01):
N = X.shape[0]
for k in range(N):
x = X[k, :]
d = D[k]
v = np.dot(W, x)
y = sigmoid(v)
e = d - y
delta = y * (1 - y) * e
dW = alpha * delta * x
W += dW
return W
# 初始化数据
X = np.array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
D = np.array([0, 0, 1, 1])
# 初始化权重
W = 2 * np.random.rand(1, X.shape[1]) - 1
# 训练过程
for epoch in range(10000):
W = delta_sgd(W, X, D)
# 推理过程
N = X.shape[0]
for k in range(N):
x = X[k, :]
v = np.dot(W, x)
y = sigmoid(v)
print(f"Input: {x}, Output: {y}")
运行结果:
2.批量算法的实现(DeltaBatch)
该算法与SGD最大的不同在于:不直接用单个训练数据点计算的权重更新值dW去训练神经网络,而是将全部训练数据计算的权重更新值累加为dWsum,并用他的平均值dWavg对权重进行一次调整。
代码:
import numpy as np
from Sigmoid import sigmoid
def delta_batch(W, X, D):
alpha = 0.9
dWsum = np.zeros((3, 1))
N = 4
for k in range(N):
x = X[k, :].reshape(-1, 1) # 将行向量转换为列向量
d = D[k]
v = np.dot(W, x)
y = sigmoid(v)
e = d - y
delta = y * (1 - y) * e
dW = alpha * delta * x
dWsum += dW
dWavg = dWsum / N
W[0] += dWavg[0]
W[1] += dWavg[1]
W[2] += dWavg[2]
return W
测试代码:
import numpy as np
from Sigmoid import sigmoid
# 定义DeltaBatch函数,用于批量梯度下降
def delta_batch(W, X, D):
alpha = 0.01 # 学习率,可能需要调整以获取最佳结果
N = X.shape[0]
dW_sum = np.zeros(W.shape)
for k in range(N):
x = X[k, :]
d = D[k]
v = np.dot(W, x)
y = sigmoid(v)
e = d - y
delta = y * (1 - y) * e
dW = alpha * delta * x
dW_sum += dW
dW_avg = dW_sum / N
W += dW_avg
return W
# 初始化数据
X = np.array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
D = np.array([0, 0, 1, 1])
# 初始化权重
W = 2 * np.random.rand(1, X.shape[1]) - 1
# 训练过程
for epoch in range(40000):
W = delta_batch(W, X, D)
# 计算预测结果
N = X.shape[0]
for k in range(N):
x = X[k, :]
v = np.dot(W, x)
y = sigmoid(v)
print(f"Input: {x}, Output: {y}")
运行结果:
3.随机梯度下降算法与批量算法的比较
将两种算法的权重初始化为相同的值
具体实现代码:
import numpy as np
import matplotlib.pyplot as plt
from Sigmoid import sigmoid
# 定义DeltaSGD函数
def delta_sgd(W, X, D):
alpha = 0.9
N = X.shape[0]
for k in range(N):
x = X[k, :]
d = D[k]
v = np.dot(W, x)
y = sigmoid(v)
e = d - y
delta = y * (1 - y) * e
dW = alpha * delta * x
W += dW
return W
# 定义DeltaBatch函数
def delta_batch(W, X, D):
alpha = 0.9
N = X.shape[0]
dW_sum = np.zeros(W.shape)
for k in range(N):
x = X[k, :]
d = D[k]
v = np.dot(W, x)
y = sigmoid(v)
e = d - y
delta = y * (1 - y) * e
dW = alpha * delta * x
dW_sum += dW
dW_avg = dW_sum / N
W += dW_avg
return W
# 定义初始数据和权重
X = np.array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
D = np.array([0, 0, 1, 1])
E1 = np.zeros(1000)
E2 = np.zeros(1000)
W1 = 2 * np.random.rand(1, X.shape[1]) - 1
W2 = W1.copy()
# 训练过程
for epoch in range(1000):
W1 = delta_sgd(W1, X, D)
W2 = delta_batch(W2, X, D)
es1 = 0
es2 = 0
N = X.shape[0]
for k in range(N):
x = X[k, :]
d = D[k]
v1 = np.dot(W1, x)
y1 = sigmoid(v1)
es1 += (d - y1) ** 2
v2 = np.dot(W2, x)
y2 = sigmoid(v2)
es2 += (d - y2) ** 2
E1[epoch] = es1 / N
E2[epoch] = es2 / N
# 绘图
plt.plot(E1, 'r', label='SGD')
plt.plot(E2, 'b:', label='Batch')
plt.xlabel('Epoch')
plt.ylabel('Average of Training Error')
plt.legend()
plt.show()
结果对比图:
由图可以看出,SGD算法比批量算法能更快地降低学习误差,也即SGD算法的学习速度更快。