numpy实现两层神经网络
一个全连接ReLU神经网络,一个隐藏层,没有bias,从x预测y,使用L2 Loss
- h = W1X+b1
- a=max(0,h)
- yhat=W2a+b2
这一实现完全使用numpy来计算前向神经网络,loss,和反向传播
- forward pass
- loss
- backward pass
numpy ndarray 是一个普通的n维array,它不知道任何关于深度学习或者梯度的知识,也不知道计算图(computation graph),只是一种用来计算数学运算的数据结构
N,D_in,H,D_out = 64, 1000, 100 ,10
#随机创建一些训练数据
x = np.random.randn(N, D_in)
y=np.random.randn(N, D_out)
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate =1e-6
for t in range(500):
# Forward pass
h = x.dot(w1) # N * H
h_relu = np.maximum(h,0) # N * H
y_pred = h_relu.dot(w2) #N * D_out
#compute loss
loss = np.square(y_pred - y).sum()
print(it,loss)
#Backward pass
#compute the gradient
grad_y_pred = 2.0 *(y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h = grad_h_relu.copy()
grad_h[h<0] = 0
grad_w1 = x.T.dot(grad_h)
#update weights of w1 and w2
w1 -= learning_rate + grad_w1
w2 -= learning_rate + grad_w2
下面我们运用pytorch实现神经网络代码:
N,D_in,H,D_out = 64, 1000, 100 ,10
#随机创建一些训练数据
x = torch.randn(N, D_in)
y=torch.randn(N, D_out)
w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)
learning_rate =1e-6
for t in range(500):
# Forward pass
h = x.mm(w1) # N * H
h_relu = h.clamp(min=0) # N * H
y_pred = h_relu.mm(w2) #N * D_out
#compute loss
loss = (y_pred - y).pow(2).item()
print(it,loss)
#Backward pass
#compute the gradient
grad_y_pred = 2.0 *(y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h = grad_h_relu.clone()
grad_h[h<0] = 0
grad_w1 = x.t().mm(grad_h)
#update weights of w1 and w2
w1 -= learning_rate + grad_w1
w2 -= learning_rate + grad_w2
简化版本
x = torch.tensor(1.,requires_grad=True)
w = torch.tensor(2.,requires_grad=True)
b = torch.tensor(3.,requires_grad=True)
y = w*x + b
y.backward()
#求导过程 dy / dv = x
print(w.grad)
print(x.grad)
print(b.grad)
注意
每次做完grad之后需要清零 要不然会一直不断在原基础上叠加
所有tensor运算在pytorch中都是计算图,需要清除计算图的内存
pytorch实现神经网络—简化版本总结
N,D_in,H,D_out = 64, 1000, 100 ,10
#随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, H,requires_grad=True)
w2 = torch.randn(H, D_out,requires_grad=True)
learning_rate =1e-6
for it in range(500):
# Forward pass
y_pred = x.mm(w1).clamp(min=0).mm(w2)
#compute loss
loss = (y_pred - y).pow(2).sum()
print(it,loss.item())
#Backward pass 算所有参数的梯度
loss.backward()
#update weights of w1 and w2
with torch.no_grad():
w1 -= learning_rate + w1.grad
w2 -= learning_rate + w2.grad
w1.grad.zero_()
w2.grad.zero_()