TensorFlow中的计算过程可以表示为一个计算图,计算图中的每一个运算操作可以视为一个节点。(P87-3.1、3.2)
假设有如下问题: 一个苹果100块钱,一个橘子150块钱 消费税为10% 买了2个苹果,3个橘子,一共需要支付多少钱?
根据需要构建计算图
局部计算
上图中,我们对于苹果和橘子的计算是分开的,然后合并。这里的局部也就是在计算过程中只将与自己相关的信息进行计算输出结果;
计算图可以集中精力于局部计算,无论全局多么复杂,各个步骤说要做的就是对象节点的局部计算,这样就可以通过局部计算,将结果传递下去,就可以获得全局的复杂计算结果
反向传播的计算顺序
1、将节点的输入信号乘以节点的局部导数(偏导数),然后传递给下一个节点 2、然后再将上一步的输出作为下一节点的输入,同样乘以局部导数
1、加法节点的反向传播实现
以 z=x+y为例来说明
对于加法节点,反向传播将上游传过来的导数乘以1,然后传向下游 也就是说,因为加法节点的反向传播只乘以1,所以输入的值会原封不动地流向下一个节点
#python实现加法层
#创建类
class AddLayer:
#创建类中函数
def __init__(self):
pass
# 正向传播
def forward(self,x, y):
out = x + y
return out
# 反向传播
def backward(self, dout):
dx = dout * 1
dy = dout * 1
return dx,dy
#测试
apple_price=200
orange_price=450
#创建加法器对象
out_layer = AddLayer()
# forward
price = out_layer.forward(apple_price,orange_price)
# backward
dprice = 1
d_apple,d_orange = out_layer.backward(dprice)
print("d_apple_price:",d_apple)
print("d_orange_price:",d_orange)
print("price:",price)
d_apple_price: 1 d_orange_price: 1 price: 650
2、乘法节点的反向传播实现
以 z=xy为对象来说明
乘法节点的反向传播需要正向传播时的输入信号值,因此,实现乘法节点的反向传播时,需要保存正向传播的输入信号
#python实现乘法层,乘法节点的反向传播会乘以输入信号的翻转值
class MulLayer:
def __init__(self):
self.x = None
self.y = None
# 正向传播
def forward(self, x, y):
self.x = x
self.y = y
out = x * y
return out
# 反向传播
def backward(self, dout):
dx = dout * self.y
dy = dout * self.x
return dx, dy
# 测试
apple = 100 # 苹果价格
apple_num = 2 # 苹果个数
tax = 1.1 # 消费税
mul_apple_layer = MulLayer() # 创建乘法器对象
mul_tax_layer = MulLayer() # 创建乘法器对象
# forward
apple_price = mul_apple_layer.forward(apple, apple_num) # 2个苹果的价格
price = mul_tax_layer.forward(apple_price, tax) # 支付金额
# backward
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)
print("price:", int(price))
print("dapple_price:",dapple_price)
print("dTax:", dtax)
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
price: 220 dapple_price: 1.1 dTax: 200 dApple: 2.2 dApple_num: 110
苹果和橘子问题的实现
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1
# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()
# forward
apple_price = mul_apple_layer.forward(apple, apple_num) # (1)
orange_price = mul_orange_layer.forward(orange, orange_num) # (2)
all_price = add_apple_orange_layer.forward(apple_price, orange_price) # (3)
price = mul_tax_layer.forward(all_price, tax) # (4)
# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice) # (4)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price) # (3)
dorange, dorange_num = mul_orange_layer.backward(dorange_price) # (2)
dapple, dapple_num = mul_apple_layer.backward(dapple_price) # (1)
print("price:", int(price))
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dOrange:", dorange)
print("dOrange_num:", int(dorange_num))
print("dTax:", dtax)
price: 715 dApple: 2.2 dApple_num: 110 dOrange: 3.3000000000000003 dOrange_num: 165 dTax: 650
加权信号的计算图以及反向传播实现
神经网络的前向传播中,为了计算加权信号的总和,使用了矩阵的乘积运算
以矩阵为对象的反向传播
推导:
正向传播时,偏置B被加到X·W的各个数据上; 反向传播时,各个数据的反向传播的值需要汇总为偏置的元素
import numpy as np
X=np.array([[0,0],
[1,1]])
W=np.array([[2,2,2],
[8,8,8]])
X_dot_W=np.dot(X,W)
X_dot_W
array([[ 0, 0, 0], [10, 10, 10]])
B = np.array([1,2,3])
X_dot_W+B
array([[ 1, 2, 3], [11, 12, 13]])
dY = np.array([[1,2,3],
[4,5,6]])
dY
array([[1, 2, 3], [4, 5, 6]])
dB = np.sum(dY,axis=0)
dB
array([5, 7, 9])
dX = np.dot(dY,W.T)
dX
array([[ 12, 48], [ 30, 120]])
激活函数为relu的两层全连接神经网络的实现, 包括网络的实现、梯度的反向传播计算和权重更新过程:
import numpy as np
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# 创建输入输出的数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
# 初始化权重
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = np.dot(x,w1)
h_relu = np.maximum(h, 0)
y_pred = np.dot(h_relu,w2)
# Compute and print loss
loss = np.square(y_pred - y).sum()
print(t, loss)
# Backward
grad_y_pred = 2.0 * (y_pred - y) #标量对向量的求导问题
grad_w2 = np.dot(h_relu.T,grad_y_pred)
grad_h_relu = np.dot(grad_y_pred,w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = np.dot(x.T,grad_h)
# Update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2