The mechanics of learning
No.4 Training, validation, and overfitting
On one hand, you need to model to have enough capacity for it to fit the training set. On the other hand, you need the model to avoid overfitting. Therefore, the process for choosing the right size of a neural network model, in terms of parameters, is based on two steps: increase the size until it fits and then scale it down until it stops overfitting.
import torch
import torch.optim as optim
t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u
def model(t_u, w, b):
return w * t_u + b
def loss_fn(t_p, t_c):
squared_diffs = (t_p - t_c)**2
return squared_diffs.mean()
n_samples = t_u.shape[0]
n_val = int(0.2*n_samples)
shuffled_indices = torch.randperm(n_samples)
train_indices = shuffled_indices[:-n_val]
val_indices = shuffled_indices[-n_val:]
train_t_u = t_u[train_indices]
train_t_c = t_c[train_indices]
val_t_u = t_u[val_indices]
val_t_c = t_c[val_indices]
train_t_un = 0.1 * train_t_u
val_t_un = 0.1 * val_t_u
def training_loop(n_epochs, optimizer, params, train_t_u, val_t_u, train_t_c, val_t_c):
for epoch in range(n_epochs+1):
train_t_p = model(train_t_u, *params)
train_loss = loss_fn(train_t_p, train_t_c)
val_t_p = model(val_t_u, *params)
val_loss = loss_fn(val_t_p, val_t_c)
optimizer.zero_grad()
train_loss.backward()
optimizer.step()
if epoch <=3 or epoch % 500 == 0:
print('Epoch {}, Training loss {}, Validation loss {}'.format(epoch, float(train_loss), float(val_loss)))
return params
params = torch.tensor([1.0, 0.0], requires_grad=True)
learning_rate = 1e-2
optimizer = optim.SGD([params], lr=learning_rate)
training_loop(n_epochs=3000,
optimizer=optimizer,
params=params,
train_t_u=train_t_un,
val_t_u=val_t_un,
train_t_c=train_t_c,
val_t_c=val_t_c)
Note that you have no val_loss.backward() here because you don’t want to train the model on the validation data.
Here, we’re not being entirely fair to the model. The validation set is small, so the validation loss will be meaningful only up to a point. In any case, note that the validation loss is higher than your training loss, although not by an order of magnitude. The fact that a model performs better on the training set is expected since the model parameters are being shaped by the training set. Your main goal is to also see both the training loss and the validation loss decreasing. Although ideally, both losses would be roughly the same value, as long as validation loss stays reasonably close to the training loss, you know that your model is continuing to learn generalized things about your data. In figure 4.13, case C is ideal, and D is acceptable. In case A, the model isn’t learning at all, and in case B, you see overfitting.