前言
PyTroch中我们经常使用到Numpy进行数据的处理,然后再转为Tensor,但是关系到数据的更改时我们要注意方法是否是共享地址,这关系到整个网络的更新。本篇就In-palce操作,拷贝操作中的注意点进行总结。
In-place操作
pytorch中原地操作的后缀为_,如.add_()或.scatter_(),就地操作是直接更改给定Tensor的内容而不进行复制的操作,即不会为变量分配新的内存。Python操作类似+=或*=也是就地操作。(我加了我自己~)
为什么in-place操作可以在处理高维数据时可以帮助减少内存使用呢,下面使用一个例子进行说明,定义以下简单函数来测量PyTorch的异位ReLU(out-of-place)和就地ReLU(in-place)分配的内存:
import torch # import main library
import torch.nn as nn # import modules like nn.ReLU()
import torch.nn.functional as F # import torch functions like F.relu() and F.relu_()
def get_memory_allocated(device, inplace = False):
'''
Function measures allocated memory before and after the ReLU function call.
INPUT:
- device: gpu device to run the operation
- inplace: True - to run ReLU in-place, False - for normal ReLU call
'''
# Create a large tensor
t = torch.randn(10000, 10000, device=device)
# Measure allocated memory
torch.cuda.synchronize()
start_max_memory = torch.cuda.max_memory_allocated() / 1024**2
start_memory = torch.cuda.memory_allocated() / 1024**2
# Call in-place or normal ReLU
if inplace:
F.relu_(t)
else:
output = F.relu(t)
# Measure allocated memory after the call
torch.cuda.synchronize()
end_max_memory = torch.cuda.max_memory_allocated() / 1024**2
end_memory = torch.cuda.memory_allocated() / 1024**2
# Return amount of memory allocated for ReLU call
return end_memory - start_m