(3)学习机制---Optimizers a la carte, Nits in autograd and switching it off

The mechanics of learning
No.3 Optimizers a la carte

Every optimizer constructor takes a list of parameters (aka PyTorch tensors, typically with requires_grad set to True) as the first input. All parameters passed to the optimizer are retained inside the optimizer object so that the optimizer can update their values and access their grad attribute.
1
Each optimizer exposes two methods: zero_grad and step. The former zeros the grad attribute of all the parameters passed to the optimizer upon construction. The latter updates the value of those parameters according to the optimization strategy implemented by the specific optimizer.

import torch
import torch.optim as optim

t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u

def model(t_u, w, b):
    return w * t_u + b

def loss_fn(t_p, t_c):
    squared_diffs = (t_p - t_c)**2
    return squared_diffs.mean()

def training_loop(n_epochs, optimizer, params, t_u, t_c):
    for epoch in range(1, n_epochs + 1):
        t_p = model(t_u, *params) 
        loss = loss_fn(t_p, t_c)
        
        optimizer.zero_grad() # 清空梯度
        loss.backward()
        optimizer.step()	# 更新梯度

        if epoch % 500 == 0:
            print('Epoch %d, Loss %f' % (epoch, float(loss)))
            
    return params

params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate) # <1>

training_loop(
    n_epochs = 5000, 
    optimizer = optimizer,
    params = params, # <1> 
    t_u = t_un,
    t_c = t_c)
No.4 Nits in autograd and switching it off

上面代码的计算图如下:
1
如果在误操作中,将加入了val_loss.backward(), 由于params是一样的,calling backward on val_loss would lead to gradients accumulating in the params tensor, on top of those generated during the train_loss.backward() call.
导致梯度更新依赖于这两个数据集。

如果验证集不用backward,则不用建立计算图。

  1. tracking history comes with additional costs that you could forgo during the validation pass, especially when the model has millions of parameters.
  2. You won’t see any meaningful
    advantage in terms of speed or memory consumption on your small problem. But for larger models, the differences can add up.
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c):
    for epoch in range(n_epochs+1):
        train_t_p = model(train_t_u, *params)
        train_loss = loss_fn(train_t_p, train_t_c)
        
        with torch.no_grad():
            val_t_p = model(val_t_u, *params)
            val_loss = loss_fn(val_t_p, val_t_c)
            assert val_loss.requires_grad == False
        
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
        
        if epoch <=3 or epoch % 500 == 0:
            print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, float(train_loss), float(val_loss)))
    return params 

如果要实现梯度的开关:You could define a calc_forward function that takes data in input and runs model and loss_fn with or without autograd, according to a Boolean is_train argument:

def calc_forward(t_u, t_c, is_train):
	with torch.set_grad_enabled(is_train):
		t_p = model(t_u, *params)
		loss = loss_fn(t_p, t_c)
	return loss
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值