(3)学习机制---Optimizers a la carte, Nits in autograd and switching it off-CSDN博客

本文链接：https://blog.csdn.net/weixin_43333260/article/details/108917725

The mechanics of learning

No.3 Optimizers a la carte

Every optimizer constructor takes a list of parameters (aka PyTorch tensors, typically with requires_grad set to True) as the first input. All parameters passed to the optimizer are retained inside the optimizer object so that the optimizer can update their values and access their grad attribute.

Each optimizer exposes two methods: zero_grad and step. The former zeros the grad attribute of all the parameters passed to the optimizer upon construction. The latter updates the value of those parameters according to the optimization strategy implemented by the specific optimizer.

import torch
import torch.optim as optim

t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u

def model(t_u, w, b):
    return w * t_u + b

def loss_fn(t_p, t_c):
    squared_diffs = (t_p - t_c)**2
    return squared_diffs.mean()

def training_loop(n_epochs, optimizer, params, t_u, t_c):
    for epoch in range(1, n_epochs + 1):
        t_p = model(t_u, *params) 
        loss = loss_fn(t_p, t_c)
        
        optimizer.zero_grad() # 清空梯度
        loss.backward()
        optimizer.step()	# 更新梯度

        if epoch % 500 == 0:
            print('Epoch %d, Loss %f' % (epoch, float(loss)))
            
    return params

params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate) # <1>

training_loop(
    n_epochs = 5000, 
    optimizer = optimizer,
    params = params, # <1> 
    t_u = t_un,
    t_c = t_c)

No.4 Nits in autograd and switching it off

上面代码的计算图如下：

如果在误操作中，将加入了val_loss.backward()，由于params是一样的，calling backward on val_loss would lead to gradients accumulating in the params tensor, on top of those generated during the train_loss.backward() call.
导致梯度更新依赖于这两个数据集。

如果验证集不用backward，则不用建立计算图。

tracking history comes with additional costs that you could forgo during the validation pass, especially when the model has millions of parameters.
You won’t see any meaningful
advantage in terms of speed or memory consumption on your small problem. But for larger models, the differences can add up.

def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c):
    for epoch in range(n_epochs+1):
        train_t_p = model(train_t_u, *params)
        train_loss = loss_fn(train_t_p, train_t_c)
        
        with torch.no_grad():
            val_t_p = model(val_t_u, *params)
            val_loss = loss_fn(val_t_p, val_t_c)
            assert val_loss.requires_grad == False
        
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        
        if epoch <=3 or epoch % 500 == 0:
            print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, float(train_loss), float(val_loss)))
    return params

如果要实现梯度的开关：You could define a calc_forward function that takes data in input and runs model and loss_fn with or without autograd, according to a Boolean is_train argument:

def calc_forward(t_u, t_c, is_train):
	with torch.set_grad_enabled(is_train):
		t_p = model(t_u, *params)
		loss = loss_fn(t_p, t_c)
	return loss