反向传播
两层神经网络
公式为:
y
^
=
W
2
(
W
1
⋅
X
+
b
1
)
+
b
2
\begin{aligned}\hat{y}&=W_2(W_1\cdot X+b_1)+b_2\end{aligned}
y^=W2(W1⋅X+b1)+b2
叠加的情况
y ^ = W 2 ( W 1 ⋅ X + b 1 ) + b 2 = W 2 ⋅ W 1 ⋅ X + ( W 2 b 1 + b 2 ) = W ⋅ X + b \begin{gathered} \begin{aligned}\hat{y}=W_2(W_1\cdot X+b_1)+b_2\end{aligned} \\ \begin{aligned}=W_2\cdot W_1\cdot X+(W_2b_1+b_2)\end{aligned} \\ =W\cdot X+b \end{gathered} y^=W2(W1⋅X+b1)+b2=W2⋅W1⋅X+(W2b1+b2)=W⋅X+b
- 这会导致多层神经网络叠加起来的结果和最原始的两层神经网络的结果是一致的,多次线性叠加是无无意义的,故需要在每层引入一个非线性的函数
整个传播过程的示意图
传播过程的推导图
课后手推作业
Pytorch中的Tensor介绍
- Tensor包含data(权重)和grad(梯度)
调torch库版代码
import matplotlib.pyplot as plt
import torch
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
l_list = []
Epoch = []
w = torch.tensor([1.0]) # w为tensor类型
w.requires_grad = True # 为true表示需要计算梯度,如不设置,默认为不需要
def forward(x):
return x * w # 此时w为tensor类型,会自动将x也转换成tensor类型
def loss(x, y): # 损失函数,这一步已经构建好了计算图,并使用张量
y_pred = forward(x)
return (y_pred - y) ** 2
print("predict (before training)", 4, forward(4).item())
for epoch in range(100):
for x, y in zip(x_data, y_data):
l = loss(x, y) # 前馈,构成计算图,计算loss
l.backward() # 反馈,l是张量,调用backward函数,可以把数据链路上所有需要求梯度的地方都求出来并存到w.grad里
print('\tgrad:', x, y, w.grad.item()) #用`item()`函数把数值拿出来做成标量
w.data = w.data - 0.01 * w.grad.data # 此时的w.grad还是一个张量,需要取到data值进行计算,避免建立计算图(我们在这里并不需要建立计算图)
w.grad.data.zero_() # 清空计算图,因为每次的计算图可能不一样,并不是每次都能沿用上一次的图
l_list.append(l.item())
Epoch.append(epoch)
print("Epoch:", epoch, l.item())
print("predict (after training)", 4, forward(4).item())
print(w.grad.item())
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('y=w*x\nw=1.0 a=0.01')
plt.grid(alpha=0.4)
plt.plot(Epoch, l_list)
plt.show()
手搓版,不调torch库
import random as rd
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w1 = 0.5
w2 = 2.0
b = 0.5
def forward(x):
return w1 * x ** 2 + w2 * x + b
def loss(x, y):
# y_pred = forward(x)
return (forward(x) - y) ** 2
def gradient_w1(x, y):
return 2 * (x ** 2) * (forward(x) - y)
def gradient_w2(x, y):
return 2 * x * (forward(x) - y)
def gradient_b(x, y):
return 2 * (forward(x) - y)
a = 0.01
w1_list = [w1]
w2_list = [w2]
b_list = [b]
print('Predict(Before training):', 4, forward(4))
for epoch in range(15000):
rc = rd.randrange(0, 3)
x = x_data[rc]
y = y_data[rc]
grad_val_w1 = gradient_w1(x, y)
grad_val_w2 = gradient_w2(x, y)
grad_val_b = gradient_b(x, y)
print("\tGradient(w1, w2, b):", x, y, grad_val_w1, grad_val_w2, grad_val_b)
w1 -= a * grad_val_w1
w2 -= a * grad_val_w2
b -= a * grad_val_b
w1_list.append(w1)
w2_list.append(w2)
b_list.append(b)
loss_val = loss(x, y)
print("Progress:", epoch, "loss =", loss_val)
print('Predict(After training):', 4, forward(4))
# epoch_list = [range(1, 15001)]
fig = plt.figure(num=1, figsize=(8, 24))
ax1 = fig.add_subplot(311)
ax1.set_title('w1')
ax1.plot(w1_list)
ax1 = fig.add_subplot(312)
ax1.set_title('w2')
ax1.plot(w2_list)
ax1 = fig.add_subplot(313)
ax1.set_title('b')
ax1.plot(b_list)
plt.show()