The mechanics of learning
No.3 Optimizers a la carte
Every optimizer constructor takes a list of parameters (aka PyTorch tensors, typically with requires_grad set to True) as the first input. All parameters passed to the optimizer are retained inside the optimizer object so that the optimizer can update their values and access their grad attribute.
Each optimizer exposes two methods: zero_grad and step. The former zeros the grad attribute of all the parameters passed to the optimizer upon construction. The latter updates the value of those parameters according to the optimization strategy implemented by the specific optimizer.
import torch
import torch.optim as optim
t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u
def model(t_u, w, b):
return w * t_u + b
def loss_fn(t_p, t_c):
squared_diffs = (t_p - t_c)**2
return squared_diffs.mean()
def training_loop(n_epochs, optimizer, params, t_u, t_c):
for epoch in range(1, n_epochs + 1):
t_p = model(t_u, *params)
loss = loss_fn(t_p, t_c)
optimizer.zero_grad() # 清空梯度
loss.backward()
optimizer.step() # 更新梯度
if epoch % 500 == 0:
print('Epoch %d, Loss %f' % (epoch, float(loss)))
return params
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate) # <1>
training_loop(
n_epochs = 5000,
optimizer = optimizer,
params = params, # <1>
t_u = t_un,
t_c = t_c)
No.4 Nits in autograd and switching it off
上面代码的计算图如下:
如果在误操作中,将加入了val_loss.backward(), 由于params是一样的,calling backward on val_loss would lead to gradients accumulating in the params tensor, on top of those generated during the train_loss.backward() call.
导致梯度更新依赖于这两个数据集。
如果验证集不用backward,则不用建立计算图。
- tracking history comes with additional costs that you could forgo during the validation pass, especially when the model has millions of parameters.
- You won’t see any meaningful
advantage in terms of speed or memory consumption on your small problem. But for larger models, the differences can add up.
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c):
for epoch in range(n_epochs+1):
train_t_p = model(train_t_u, *params)
train_loss = loss_fn(train_t_p, train_t_c)
with torch.no_grad():
val_t_p = model(val_t_u, *params)
val_loss = loss_fn(val_t_p, val_t_c)
assert val_loss.requires_grad == False
optimizer.zero_grad()
train_loss.backward()
optimizer.step()
if epoch <=3 or epoch % 500 == 0:
print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, float(train_loss), float(val_loss)))
return params
如果要实现梯度的开关:You could define a calc_forward
function that takes data in input and runs model
and loss_fn
with or without autograd, according to a Boolean is_train
argument:
def calc_forward(t_u, t_c, is_train):
with torch.set_grad_enabled(is_train):
t_p = model(t_u, *params)
loss = loss_fn(t_p, t_c)
return loss