1.Serializing tensors
Creating a tensor on the fly is all well and good, but if the data inside is valuable, we will want to save it to a file and load it back at some point.
Here’s how we can save our points tensor to an ourpoints.t file:
torch.save(points, 'C:/Users/Desktop/1/ourpoints.t')
we can’t read the tensor with software other than PyTorch.
As an alternative, we can pass a file descriptor in lieu of the filename:
with open('C:/Users/Desktop/1/ourpoints.t','wb') as f: # wb写
torch.save(points, f)
Loading our points back is similarly a one-liner
points = torch.load('C:/Users/Desktop/1/ourpoints.t')
or, equivalently,
with open('C:/Users/Desktop/1/ourpoints.t','rb') as f:
points = torch.load(f)
2.Serializing to HDF5 with h5py
we should learn how to save tensors interoperably for those times when it is. We’ll look next at how to do so.
命令行安装
conda install h5py
At this point, we can save our points tensor by converting it to a NumPy array (at no cost, as we noted earlier) and passing it to the create_dataset function:
import h5py
f = h5py.File('C:/Users/Desktop/1/ourpoints.hdf5', 'w')
dset = f.create_dataset('coords', data=points.numpy())
f.close()
Here ‘coords’ is a key into the HDF5 file.
Let’s suppose we want to load just the last two points in our dataset:
f = h5py.File('C:/Users/Desktop/1/ourpoints.hdf5', 'r')
dset = f['coords']
last_points = dset[-2:]
The data is not loaded when the file is opened or the dataset is required. Rather, the data stays on disk until we request the second and last rows in the dataset. At that point, h5py accesses those two columns and returns a NumPy array-like object encapsulating that region in that dataset that behaves like a NumPy array and has the same API.
Owing to this fact, we can pass the returned object to the torch.from_numpy function to obtain a tensor directly. Note that in this case, the data is copied over to the tensor’s storage:
last_points = torch.from_numpy(last_points)
f.close()
Once we’re finished loading data, we close the file. Closing the HDFS file invalidates the datasets, and trying to access dset afterward will give an exception.
3.Moving tensors to the GPU
So far in this chapter, when we’ve talked about storage, we’ve meant memory on the CPU. PyTorch tensors also can be stored on a different kind of processor: a graphics processing unit (GPU). Every PyTorch tensor can be transferred to (one of) the GPU(s) in order to perform massively parallel, fast computations.
Here is how we can create a tensor on the GPU by specifying the corresponding argument to the constructor:
points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')
We could instead copy a tensor created on the CPU onto the GPU using the to method:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points_gpu = points.to(device='cuda')
If our machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine, such as
points_gpu = points.to(device='cuda:0')
At this point, any operation performed on the tensor, such as multiplying all elements by a constant, is carried out on the GPU:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points = 2 * points # Multiplication performed on the CPU
points_gpu=2* points.to(device='cuda') # Multiplication performed on the GPU
Note that the points_gpu tensor is not brought back to the CPU once the result has been computed.In order to move the tensor back to the CPU, we need to provide a cpu argument to the to method, such as
points_cpu = points_gpu.to(device='cpu')
We can also use the shorthand methods cpu and cuda instead of the to method to achieve the same goal:
points_gpu = points.cuda() # Defaults to GPU index 0
points_gpu = points.cuda(0)
points_cpu = points_gpu.cpu() # device='cpu'