RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.92 GiB total capacity; 9.79 GiB already allocated; 539.44 MiB free; 10.28 MiB cached)
本人的pytorch的版本是1.1.0,这个是我pytorch版本更新后,我已开的bitch_size=64,出现内存不够,但是我把bitch_size改小,bitch_size=2,依然出现内存溢出报错。本文使用的模型是较为复杂的GoogleNet(使用简单的VGG是没报错),我的源码是:
# 判断gpu是否可用
if torch.cuda.is_available():
device = 'cuda'
else:
device = 'cpu'
# device = 'cpu'
#device = 'cpu'
device = torch.device(device)
cnn = GoogLeNet().to(device)
optimizer = Adam(cnn.parameters(), lr=0.001, betas=(0.9, 0.999)) # 选用AdamOptimizer
loss_fn = nn.SmoothL1Loss() # 定义损失函数
# 训练并评估模型
data = Dataset(epochs=args.EPOCHS, batch=args.BATCH, val_batch=args.BATCH)
model = Model(data)
lowest_loss = 1e5
for i in range(data.get_step()):
cnn.train()
x_train, y_train = data.next_train_batch()
x_test, y_test = data.next_validation_batch()
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)
x_train = x_train.float().to(device)
y_train = y_train.float().to(device)
outputs = cnn(x_train) # 问题出错点
optimizer.zero_grad()
#print(x_train.shape,outputs.shape,y_train.shape)
loss = loss_fn(outputs, y_train)
loss.backward()
optimizer.step()
print(loss)
解决:
out of memory解决参考来源参考链接
把outputs = cnn(x_train)改为
with torch.no_grad():
outputs = cnn(x_train)
改完后,但是又爆出新的bugs
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
然后又参考这个博客参考链接
修改x_train和x_train的requires_grad=True就能解决
解决后的代码为:
# 判断gpu是否可用
if torch.cuda.is_available():
device = 'cuda'
else:
device = 'cpu'
# device = 'cpu'
#device = 'cpu'
device = torch.device(device)
# cnn = Net().to(device)
cnn = GoogLeNet().to(device)
optimizer = Adam(cnn.parameters(), lr=0.001, betas=(0.9, 0.999)) # 选用AdamOptimizer
loss_fn = nn.SmoothL1Loss() # 定义损失函数
# 训练并评估模型
data = Dataset(epochs=args.EPOCHS, batch=args.BATCH, val_batch=args.BATCH)
model = Model(data)
lowest_loss = 1e5
for i in range(data.get_step()):
cnn.train()
x_train, y_train = data.next_train_batch()
x_test, y_test = data.next_validation_batch()
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)
x_train = x_train.float().to(device)
y_train = y_train.float().to(device)
x_train = Variable(x_train, requires_grad=True)
y_train = Variable(y_train, requires_grad=True)
with torch.no_grad():
outputs = cnn(x_train)
optimizer.zero_grad()
#print(x_train.shape,outputs.shape,y_train.shape)
loss = loss_fn(outputs, y_train)
loss.backward()
optimizer.step()
print(loss)