pytorch中GPU的使用

最新推荐文章于 2024-08-03 16:13:05 发布

尧景

最新推荐文章于 2024-08-03 16:13:05 发布

阅读量910

点赞数

分类专栏：深度之眼pytorch 文章标签： pytorch 自然语言处理

本文链接：https://blog.csdn.net/Ying_M/article/details/117947278

版权

深度之眼pytorch 专栏收录该内容

4 篇文章 2 订阅

订阅专栏

CPU与GPU

CPU（Central Processing Unit, 中央处理器）：主要包括控制器和运算器
GPU（Graphics Processing Unit, 图形处理器）：处理统一的、无依赖的大规模数据运算

数据迁移至GPU

在这里插入图片描述
其中图中的data通常有两种形式：

Tensor(张量)
Module(模型)

**to函数：**转换数据类型/设备

tensor.to(*args, **kwargs)
module.to(*args, **kwargs)
区别：张量不执行inplace操作（即to函数之后，它会重新构建一个新的张量），模型执行inplace操作。
【举例】

# 判断cuda是否可用
# device = torch.device("cuda:0" if torch.cuda.is_available else "cpu")

x = torch.ones((3,3))
x = x.to(torch.float64)

x = torch.ones((3,3))
x = x.to('cuda') #有等号

linear = nn.Linear(2,2)
linear.to(torch.double) # 无等号

gpu1 = torch.device('cuda')
linear.to(gpu1)

torch.cuda常用方法

torch.cuda.device_count() : 计算当前可见可用 gpu数
torch.cuda.get_device_name() : 获取gpu名称
torch.cuda.manual_seed() : 为当前gpu设置随机种子
torch.cuda.manual_seed_all() : 为所有可见可用gpu设置随机种子
torch.cuda.set_device() : 设置主gpu为哪一个物理gpu(不推荐)
【推荐】(设置系统的环境变量)

	os.environ.setdefault("CUDA_VISIBLE_DEVICES", "2,3")

在系统环境变量中，有一个环境变量是CUDA_VISIBLE_DEVICES，这个环境变量是控制当前这个脚本可见的gpu的数量。

逻辑gpu是可能变的，但是物理gpu是不可能变的，代码中设置2、3号gpu是可见的，因此逻辑gpu只有两个gpu，即逻辑gpu中的gpu2、gpu3是不存在的，只有gpu0和gpu1，而逻辑gpu中的gpu0对应物理gpu中的gpu2，逻辑gpu中的gpu1对应物理gpu中的gpu3.
在这里插入图片描述

多GPU并行运算

多gpu运算的分发并行机制：分发---->并行运算----->结果回收
在这里插入图片描述
首先有batchsize个data(训练数据)，将这些训练数据进行平均的分发，分发到每个gpu上，然后每个gpu进行并行的运算，得到运算结果后，再进行结果回收，将运算得到的结果再回收到主gpu上。
Pytorch中如何实现多gpu的运算？
将模型包装成DataParallel数据类型就可以。

torch.nn.DataParallel(module, device_ids = None, output_device = None, dim = 0)

torch.nn.DataParallel
功能：包装模型，实现分发并行机制
主要参数：

module : 需要包装分发的模型
device_ids : 可分发的gpu，默认分发到所有可见可用 gpu
output_device : 结果输出设备

# 代码学习
class FooNet(nn.Module):
	def __init__(self, neural_num, layers = 3):
		super(FooNet, self).__init__()
		self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])
	
	def forward(self, x):
		print("\n batch size in forward:{}".format(x.size()[0]))
		#forward传进来的数据是经过一个分发之后的数据
		for (i, linear) in enumerate(self.linears):
			x = linear(x)
			x = torch.relu(x)
		return x


batch_size = 16

# data
inputs = torch.randn(batch_size, 3)
labels = torch.randn(batch_size, 3)

inputs, labels = inputs.to(device), labels.to(device)

# model
net = FooNet(neural_num = 3, layers = 3)
net = nn.DataParallel(net) #net具备并行分发的机制
net.to(device)

# training
for epoch in range(1):
	outputs = net(inputs)
	print("model outputs.size:{}".format(outputs.size()))
print("CUDA_VISIBLE_DEVICES:{}".format(os.environ['CUDA_VISIBLE_DEVICES']))
print("device_count:{}".format(torch.cuda.device_count()))

在这里插入图片描述
图中左列为设置的两个gpu可见，两个gpu的情况在一次前项传播的时候，它的batchsize为8，即一个forward拿到了8个样本，这是由于设置的batchsize为16，可见gpu为2、3号，在每个gpu中拿到了16/2=8个样本
图中右列为设置的4个gpu的情况。
查询当前gpu内存剩余
在这里插入图片描述
【举例】

使用np.argsort()对gpu进行排序

GPU in PyTorch

gpu模型加载

报错1
在这里插入图片描述
尝试在一个cuda不可用的设备上进行模型的反序列化，模型是以cuda的形式进行保存的，即模型在gpu进行训练之后，保存下来之后，想在一个不可用gpu的设备上进行加载，这时就会报错。
解决：

torch.load(path_state_dict, map_location='cpu')

实现在cpu设备上加载gpu模型。

# 加载至cpu
# flag = 0
flag = 1
if flag:
	gpu_list = [0]
	gpu_list_str = ','.join(map(str, gpu_list))
	os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

	net = FooNet(neural_num = 3, layers = 3)
	net.to(device)
	
	#save
	net_state_dict = net.state_dict()
	path_state_dict = './model_in_gpu_0.pkl'
	torch.save(net_state_dict, path_state_dict)

	#load
	state_dict_load = torch.load(path_state_dict)
	#state_dict_load = torch.load(path_state_dict, map_location='cpu')
	print("state_dict_load:\n{}".format(state_dict_load))

报错2
在这里插入图片描述
由于训练的时候采用了多gpu并行运算，因此模型被DataParallel进行了包装，这就使得模型网络层的命名会多一个module，例如“module.linears.2.weight”，因此在加载state_dict的时候，导致命名不匹配。
解决：

from collections import OrderedDict
new_state_dict = OrderedDict()
for k,v in state_dict_load.items():
	namekey = k[7:] if k.startswith('module.') else k
	new_state_dict[namekey] = v