pytorch的tricks

最新推荐文章于 2025-05-13 22:10:49 发布

原@DeepGlint

最新推荐文章于 2025-05-13 22:10:49 发布

阅读量819

点赞数 1

文章标签：深度学习 pytorch 神经网络机器学习

本文链接：https://blog.csdn.net/weixin_44019216/article/details/106012762

版权

本文介绍了PyTorch中提高性能的技巧，如torch.backends.cudnn.benchmark=True加速卷积运算，并探讨了不同tensor创建方法的区别。在功能部分，讲解了nn.ModuleList、tensor.detach()与tensor.data的区别、tensor在不同设备间转换以及模型的保存与加载策略，包括跨设备保存和加载的细节。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

性能part

1. torch.from_numpy() vs. torch.tensor() vs. torch.as_tensor()

torch.from_numpy() vs. torch.tensor()

from_numpy() automatically inherits input array dtype. On the other hand, torch.Tensor is an alias for torch.FloatTensor.
Therefore, if you pass int64 array to torch.Tensor, output tensor is float tensor and they wouldn’t share the storage. torch.from_numpy gives you torch.LongTensor as expected.

torch.tensor() vs. torch.as_tensor()

torch.tensor always copies the data. For example, torch.tensor(x) is equivalent to x.clone().detach().
torch.as_tensor always tries to avoid copies of the data. One of the cases where as_tensor avoids copying the data is if the original data is a numpy array.

2. torch.backends.cudnn.benchmark=True

在网络结构固定（不是动态变化的），网络的输入形状（包括 batch size，图片大小，输入的通道）不变的场景下，在 PyTorch 程序开头设置torch.backends.cudnn.benchmark=True，就可以大大提升卷积神经网络的运行速度。原理是：在每一个卷积层中测试 cuDNN 提供的所有卷积实现算法，然后选择最快的那个。
一般加在开头，如：
if args.use_gpu and torch.cuda.is_available():
device = torch.device(‘cuda’)
torch.backends.cudnn.benchmark = True
else:
device = torch.device(‘cpu’)

功能part

1.nn.ModuleList

首先说说 nn.ModuleList 这个类，你可以把任意 nn.Module 的子类 (比如 nn.Conv2d, nn.Linear 之类的) 加到这个 list 里面，方法和 Python 自带的 list 一样，无非是 extend，append 等操作。但不同于一般的 list，加入到 nn.ModuleList 里面的 module 是会自动注册到整个网络上的，同时 module 的 parameters 也会自动添加到整个网络中:

class net1(nn.Module):
	def __init__(self):
		super(net1, self).__init__()
		self.linears = nn.ModuleList([nn.Linear(10,10) for i in range(2)])
	def forward(self, x):
		for m in self.linears:
			x = m(x)
		return x

net = net1()
print(net)
# net1(
#   (modules): ModuleList(
#		(0): Linear(in_features=10, out_features=10, bias=True)
#		(1): Linear(in_features=10, out_features=10, bias=True)
#	)
# )

for param in net.parameters():
	print(type(param.data), param.size())
# <class 'torch.Tensor'> torch.Size([10, 10])
# <class 'torch.Tensor'> torch.Size([10])
# <class 'torch.Tensor'> torch.Size([10, 10])
# <class 'torch.Tensor'> torch.Size([10])

2.PyTorch中 tensor.detach() 和 tensor.data 的区别

PyTorch0.4中，.data 仍保留，但建议使用 .detach(), 区别在于 .data 返回和 x 的相同数据 tensor, 但不会加入到x的计算历史里，且require s_grad = False, 这样有些时候是不安全的, 因为 x.data 不能被 autograd 追踪求微分。 .detach() 返回相同数据的 tensor ,且 requires_grad=False ,但能通过 in-place 操作报告给 autograd 在进行反向传播的时候.

与一个Variable（Tensor）的梯度有关的参数有两个：

grad_fn：把一个tensor的grad_fn设置为None，则该节点前面的节点将不会再接收来自该节点后面的节点的梯度的反向传播。
requires_grad：把一个tensor的requires_grad设置为False，则对该节点后面的节点进行反向传播时不会对该节点求梯度。

举例：变量之间的关系是x -> m -> y,这里的叶子variable是x，但是这个时候对m进行了.detach_()操作,其实就是进行了两个操作：

将m的grad_fn的值设置为None,这样m就不会再与前一个节点x关联，这里的关系就会变成x, m -> y,此时的m就变成了叶子结点
然后会将m的requires_grad设置为False，这样对y进行backward()时就不会求m的梯度

detach()和detach_()很像，两个的区别就是detach_()是对本身的更改，detach()则是生成了一个新的variable

tensor复制可以使用clone()函数和detach()函数即可实现各种需求。

clone
clone()函数可以返回一个完全相同的tensor,新的tensor开辟新的内存，但是仍然留在计算图中。

detach
detach()函数可以返回一个完全相同的tensor,新的tensor开辟与旧的tensor共享内存，新的tensor会脱离计算图，不会牵扯梯度计算。此外，一些原地操作(in-place, such as resize_ / resize_as_ / set_ / transpose_) 在两者任意一个执行都会引发错误。

Operation	New/Shared memory	Still in computation graph
tensor.clone()	New	Yes
tensor.detach()	Shared	No
tensor.clone().detach()	New	No

3.cpu tensor, gpu tensor, ndarray之间的转换

CPU tensor转GPU tensor：
cpu_imgs.cuda()
GPU tensor 转CPU tensor：
gpu_imgs.cpu()
numpy转为CPU tensor：
torch.from_numpy( imgs )
CPU tensor转为numpy数据：
cpu_imgs.numpy()
note：GPU tensor不能直接转为numpy数组，必须先转到CPU tensor。
如果tensor是标量的话，可以直接使用 item() 函数（只能是标量）将值取出来：
print loss_output.item()

4.pytorch .contiguous().view()

在torch里面，view函数相当于numpy的reshape
contiguous：view只能用在contiguous的variable上。判断ternsor是否为contiguous，可以调用torch.Tensor.is_contiguous()函数
如果在view之前用了transpose, permute等，需要用contiguous()来返回一个contiguous copy。
有些tensor并不是占用一整块内存，而是由不同的数据块组成，而tensor的view()操作依赖于内存是整块的，这时只需要执行contiguous()这个函数，把tensor变成在内存中连续分布的形式。

transpose与permute的异同

Tensor.permute(a,b,c,d, …)：permute函数可以对任意高维矩阵进行转置，但没有 torch.permute() 这个调用方式，只能 Tensor.permute()
torch.transpose(Tensor, a,b)：transpose只能操作2D矩阵的转置，有两种调用方式；
另：连续使用transpose也可实现permute的效果：

view函数与Pytorch0.4中新增的reshape的区别

reshape函数调用是不依赖于tensor在内存中是不是连续的。
即reshape ≈ tensor.contiguous().view

5.Python 字符串排序

#encoding=utf-8
#根据字符串中的数字排序，如f10应该在f2后面
import re
 
re_digits = re.compile(r'(\d+)')
 
def emb_numbers(s):
    pieces=re_digits.split(s)
    pieces[1::2]=map(int,pieces[1::2])    
    return pieces
 
def sort_strings_with_emb_numbers(alist):
    aux = [(emb_numbers(s),s) for s in alist]
    aux.sort()
    return [s for __,s in aux]
 
def sort_strings_with_emb_numbers2(alist):
    return sorted(alist, key=emb_numbers)
 
filelist='file10.txt file2.txt file1.txt'.split()
 
print filelist
 
print '--DSU排序'
print sort_strings_with_emb_numbers(filelist)
 
print '--内置DSU排序'
print sort_strings_with_emb_numbers2(filelist)

打印结果如下：
中国
[‘file10.txt’, ‘file2.txt’, ‘file1.txt’]
–DSU排序
[‘file1.txt’, ‘file2.txt’, ‘file10.txt’]
–内置DSU排序
[‘file1.txt’, ‘file2.txt’, ‘file10.txt’]

6.pytorch model 的保存

Saving & Loading Model for Inference:

1. Save/Load state_dict (Recommended):
Save:

torch.save(model.state_dict(), PATH)

Load:

model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()

NOTE:
Notice that the load_state_dict() function takes a dictionary object, NOT a path to a saved object. This means that you must deserialize the saved state_dict before you pass it to the load_state_dict() function. For example, you CANNOT load using model.load_state_dict(PATH).

2. Save/Load Entire Model:
Save:

torch.save(model, PATH)

Load:

# Model class must be defined somewhere
model = torch.load(PATH)
model.eval()

NOTE:
This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

Saving & Loading a General Checkpoint for Inference and/or Resuming Training

Save:

torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)

Load:

model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

Saving Multiple Models in One File

Save:

torch.save({
            'modelA_state_dict': modelA.state_dict(),
            'modelB_state_dict': modelB.state_dict(),
            'optimizerA_state_dict': optimizerA.state_dict(),
            'optimizerB_state_dict': optimizerB.state_dict(),
            ...
            }, PATH)

Load:

modelA = TheModelAClass(*args, **kwargs)
modelB = TheModelBClass(*args, **kwargs)
optimizerA = TheOptimizerAClass(*args, **kwargs)
optimizerB = TheOptimizerBClass(*args, **kwargs)

checkpoint = torch.load(PATH)
modelA.load_state_dict(checkpoint['modelA_state_dict'])
modelB.load_state_dict(checkpoint['modelB_state_dict'])
optimizerA.load_state_dict(checkpoint['optimizerA_state_dict'])
optimizerB.load_state_dict(checkpoint['optimizerB_state_dict'])

modelA.eval()
modelB.eval()
# - or -
modelA.train()
modelB.train()

NOTE:
When saving a model comprised of multiple torch.nn.Modules, such as a GAN, a sequence-to-sequence model, or an ensemble of models, you follow the same approach as when you are saving a general checkpoint. In other words, save a dictionary of each model’s state_dict and corresponding optimizer. As mentioned before, you can save any other items that may aid you in resuming training by simply appending them to the dictionary.

A common PyTorch convention is to save these checkpoints using the .tar file extension.

To load the models, first initialize the models and optimizers, then load the dictionary locally using torch.load(). From here, you can easily access the saved items by simply querying the dictionary as you would expect.

Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results. If you wish to resuming training, call model.train() to set these layers to training mode.

Warmstarting Model Using Parameters from a Different Model

Save:

torch.save(modelA.state_dict(), PATH)

Load:

modelB = TheModelBClass(*args, **kwargs)
modelB.load_state_dict(torch.load(PATH), strict=False)

NOTE:
Partially loading a model or loading a partial model are common scenarios when transfer learning or training a new complex model. Leveraging trained parameters, even if only a few are usable, will help to warmstart the training process and hopefully help your model converge much faster than training from scratch.

Whether you are loading from a partial state_dict, which is missing some keys, or loading a state_dict with more keys than the model that you are loading into, you can set the strict argument to False in the load_state_dict() function to ignore non-matching keys.

If you want to load parameters from one layer to another, but some keys do not match, simply change the name of the parameter keys in the state_dict that you are loading to match the keys in the model that you are loading into.

Saving & Loading Model Across Devices

Save on GPU, Load on CPU

Save:

torch.save(model.state_dict(), PATH)

Load:

device = torch.device('cpu')
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location=device))

NOTE:
When loading a model on a CPU that was trained with a GPU, pass torch.device(‘cpu’) to the map_location argument in the torch.load() function. In this case, the storages underlying the tensors are dynamically remapped to the CPU device using the map_location argument.
You can also map a model trained on “cuda:0” to “cuda:x”

Save on GPU, Load on GPU

Save:

torch.save(model.state_dict(), PATH)

Load:

device = torch.device("cuda")
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.to(device)
# Make sure to call input = input.to(device) on any input tensors that you feed to the model

NOTE:
When loading a model on a GPU that was trained and saved on GPU, simply convert the initialized model to a CUDA optimized model using model.to(torch.device(‘cuda’)). Also, be sure to use the .to(torch.device(‘cuda’)) function on all model inputs to prepare the data for the model. Note that calling my_tensor.to(device) returns a new copy of my_tensor on GPU. It does NOT overwrite my_tensor. Therefore, remember to manually overwrite tensors: my_tensor = my_tensor.to(torch.device(‘cuda’)).

Save on CPU, Load on GPU

Save:

torch.save(model.state_dict(), PATH)

Load:

device = torch.device("cuda")
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location="cuda:0"))  # Choose whatever GPU device number you want
model.to(device)
# Make sure to call input = input.to(device) on any input tensors that you feed to the model

Saving `torch.nn.DataParallel`Models

Save:

torch.save(model.module.state_dict(), PATH)

Load:

# Load to whatever device you want

NOTE:
torch.nn.DataParallel is a model wrapper that enables parallel GPU utilization. To save a DataParallel model generically, save the model.module.state_dict(). This way, you have the flexibility to load the model any way you want to any device you want.