Pytorch
什么是Pytorch
- 作为NumPy的替代品,可以利用GPU的性能进行计算
- 作为一个高灵活性、速度快的深度学习平台
使用numpy实现两层神经网络
一个全连接ReLU神经网络,一个隐藏层,没有bias。
用来从x预测y,使用L2 Loss
M o d e l = A r c h i t e c t u r e + P a r a m e t e r s Model = Architecture + Parameters Model=Architecture+Parameters
Architecture
h = W 1 x h = W_1x h=W1x
a = m a x ( 0 , h ) a = max(0, h) a=max(0,h)
y h a t = W 2 a y_{hat}=W_2a yhat=W2a
Steps
- forward pass
- loss
- backward pass
Codes
import numpy as np
import matplotlib.pyplot as plt
# 输入个数
n = 64
# 输入1000 维
d_in = 1000
# 隐层 100 维
h = 100
# 输出 10 维
d_out =10
# 学习率
learning_rate = 1e-6
# 随机初始化训练数据
x = np.random.randn(n, d_in)
y = np.random.randn(n, d_out)
# 初始化参数
w1 = np.random.randn(d_in, h)
w2 = np.random.randn(h, d_out)
loss_arr=[]
for t in range(500):
# forward pass
hidden = x.dot(w1)
h_relu = np.maximum(hidden, 0)
y_pred = h_relu.dot(w2)
# compute loss
loss = np.square(y_pred - y).sum()
if(t % 10 ==0):
# print(t, "\t",loss)
loss_arr.append(loss)
# backward pass
# compute the gradient
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_hidden = grad_h_relu.copy()
grad_hidden[hidden<0] = 0
grad_w1 = x.T.dot(grad_hidden)
# update weights of w1 and w2
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
plt.figure()
plt.plot(loss_arr)
plt.show()
Result
简单的autograd
import torch
x = torch.tensor(1.,requires_grad=True)
w = torch.tensor(2.,requires_grad=True)
b = torch.tensor(3.,requires_grad=True)
y = w * x + b
y.backward()
print(w.grad)
print(x.grad)
print(b.grad)
print(y.grad)
tensor(1.)
tensor(2.)
tensor(1.)
None
使用Pytorch实现
将numpy array用pytorch中的tensor代替
使用autograd,反向传递
import torch
import matplotlib.pyplot as plt
n = 64
d_in = 1000
h = 100
d_out =10
learning_rate = 1e-6
# 随机初始化训练数据
x = torch.rand(n, d_in)
y = torch.randn(n, d_out)
# 初始化参数
w1 = torch.randn(d_in, h, requires_grad=True)
w2 = torch.randn(h, d_out, requires_grad=True)
loss_arr=[]
for t in range(500):
# forward pass
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# compute loss
loss = (y_pred - y).pow(2).sum()
if(t % 10 ==0):
# print(t, "\t",loss)
loss_arr.append(loss.item())
# backward pass
loss.backward()
# update weights of w1 and w2
with torch.no_grad():
w1.sub_(learning_rate * w1.grad)
w2.sub_(learning_rate * w2.grad)
w1.grad.zero_()
w2.grad.zero_()
plt.figure()
plt.plot(loss_arr)
plt.show()
在pytorch中,有两种情况不能使用inplace operation:
- 对于requires_grad=True的叶子张量(leaf tensor)不能使用inplace operation
- 对于在求梯度阶段需要用到的张量不能使用inplace operation
torch.nn
加入model
加入loss
加入optimizer
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
n = 64
d_in = 1000
h = 100
d_out =10
learning_rate = 1e-4
# 随机初始化训练数据
x = torch.rand(n, d_in)
y = torch.randn(n, d_out)
# 构造模型
model = torch.nn.Sequential(
torch.nn.Linear(d_in, h),
torch.nn.ReLU(),
torch.nn.Linear(h, d_out)
)
loss_arr=[]
#均方误差
loss_fn = nn.MSELoss(reduction='sum')
# Adam: 1e-3 lr 1e-4
# 优化器
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
# forward pass
y_pred = model(x) # 自动进行model.forward()
# compute loss
loss = loss_fn(y_pred, y)
if(t % 5 ==0):
loss_arr.append(loss.item())
# backward pass
loss.backward()
optimizer.step()
optimizer.zero_grad()
plt.figure()
plt.plot(loss_arr)
plt.show()