3-4读写文件 & GPU

最新推荐文章于 2025-04-03 15:56:11 发布

卡__卡

最新推荐文章于 2025-04-03 15:56:11 发布

阅读量204

点赞数

分类专栏： PyTorch 文章标签：机器学习深度学习 python pytorch 人工智能

本文链接：https://blog.csdn.net/weixin_45825865/article/details/131610162

版权

PyTorch 专栏收录该内容

12 篇文章

订阅专栏

文章介绍了如何使用PyTorch将张量保存到文件并重新加载，包括使用torch.save和torch.load进行序列化和反序列化。此外，还讨论了利用h5py库以HDF5格式保存张量，以及选择性加载数据。最后，讲述了如何将张量转移到GPU上进行计算，并将其返回到CPU。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.Serializing tensors
Creating a tensor on the fly is all well and good, but if the data inside is valuable, we will want to save it to a file and load it back at some point.

Here’s how we can save our points tensor to an ourpoints.t file:

在这里插入图片描述

torch.save(points, 'C:/Users/Desktop/1/ourpoints.t')

we can’t read the tensor with software other than PyTorch.
在这里插入图片描述
As an alternative, we can pass a file descriptor in lieu of the filename:

with open('C:/Users/Desktop/1/ourpoints.t','wb') as f: # wb写
    torch.save(points, f)

Loading our points back is similarly a one-liner

points = torch.load('C:/Users/Desktop/1/ourpoints.t')

在这里插入图片描述

or, equivalently,

with open('C:/Users/Desktop/1/ourpoints.t','rb') as f:
    points = torch.load(f)

在这里插入图片描述

2.Serializing to HDF5 with h5py
we should learn how to save tensors interoperably for those times when it is. We’ll look next at how to do so.

命令行安装

conda install h5py

At this point, we can save our points tensor by converting it to a NumPy array (at no cost, as we noted earlier) and passing it to the create_dataset function:

import h5py
f = h5py.File('C:/Users/Desktop/1/ourpoints.hdf5', 'w')
dset = f.create_dataset('coords', data=points.numpy())
f.close()

Here ‘coords’ is a key into the HDF5 file.

Let’s suppose we want to load just the last two points in our dataset:

f = h5py.File('C:/Users/Desktop/1/ourpoints.hdf5', 'r')
dset = f['coords']
last_points = dset[-2:]

在这里插入图片描述

The data is not loaded when the file is opened or the dataset is required. Rather, the data stays on disk until we request the second and last rows in the dataset. At that point, h5py accesses those two columns and returns a NumPy array-like object encapsulating that region in that dataset that behaves like a NumPy array and has the same API.

Owing to this fact, we can pass the returned object to the torch.from_numpy function to obtain a tensor directly. Note that in this case, the data is copied over to the tensor’s storage:

last_points = torch.from_numpy(last_points)
f.close()

在这里插入图片描述

Once we’re finished loading data, we close the file. Closing the HDFS file invalidates the datasets, and trying to access dset afterward will give an exception.

3.Moving tensors to the GPU
So far in this chapter, when we’ve talked about storage, we’ve meant memory on the CPU. PyTorch tensors also can be stored on a different kind of processor: a graphics processing unit (GPU). Every PyTorch tensor can be transferred to (one of) the GPU(s) in order to perform massively parallel, fast computations.

Here is how we can create a tensor on the GPU by specifying the corresponding argument to the constructor:

points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')

We could instead copy a tensor created on the CPU onto the GPU using the to method:

points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_gpu = points.to(device='cuda')

If our machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine, such as

points_gpu = points.to(device='cuda:0')

At this point, any operation performed on the tensor, such as multiplying all elements by a constant, is carried out on the GPU:

points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]]) 
points = 2 * points  # Multiplication performed on the CPU
points_gpu=2* points.to(device='cuda')  # Multiplication performed on the GPU

Note that the points_gpu tensor is not brought back to the CPU once the result has been computed.In order to move the tensor back to the CPU, we need to provide a cpu argument to the to method, such as

points_cpu = points_gpu.to(device='cpu')

We can also use the shorthand methods cpu and cuda instead of the to method to achieve the same goal:

points_gpu = points.cuda() # Defaults to GPU index 0
points_gpu = points.cuda(0)
points_cpu = points_gpu.cpu() # device='cpu'