Pytorch笔记

最新推荐文章于 2024-08-10 05:00:00 发布

LanderXX

最新推荐文章于 2024-08-10 05:00:00 发布

阅读量580

点赞数 1

文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/LanderXX/article/details/105142886

版权

Pytorch Notebook

由于使用emacs-org编辑，为方便暂且使用了英文

tensor
autograd
neural network
1. structured construction
  1. layers (no order)
  2. forward propagate structure (ordered)
2. sequential construction
data load
1. torchvision
optimizer
train
model I/O
1. method 1 (recommended)
2. method 2
evaluate
models
1. attributes
2. pretrained models
  1. torchvision.models
sundry
problem shooting

pytorch is deeplearning’s numpy

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as Data
import torch.optim as optim
import numpy as np

tensor

create

uninitialized tensor:
x = torch.empty(5, 3)
random tensor:
x = torch.rand(5, 3)
zeros:
x = torch.zeros(5, 3)
define dtype:
x = torch.zeros(5, 3, dtype = torch.long)
from known data:
x = torch.tensor([5.5, 3])

cloning

tryna reuse existing tensor’s properties.

new_* methods:

x = x.new_ones(5, 3, dtype = torch.double)# 64-bit

copy the size:

x = torch.randn_like(x, dtype = torch.float)# 32-bit*

operation

in-place operations

write ‘_’ behind.

ex. y.add_(x) -> +=

x.t_() -> directly transpose x

transpose (permute)

x = x.permute(1, 2, 0)

about size and indexing

get size: x.size(axes)

resize:

 x = torch.randn(4, 4)
 y = x.view(16)
 z = x.view(-1, 8)
 # '-1's size will be inferred from other dims

 # use .item() to get a scalar to python number
 x = torch.randn(1)
 num = x.item()
 ```

add

 # simply
 x + y
 torch.add(x, y)
 
 # introduce the result
 torch.add(x, y, out = result)
 
 # in-place (+=)
 y.add_(x)

with numpy

numpy-form and torch-form
share the same memory location,
change together.

torch.from_numpy(npdata)

torchdata.numpy()

npdata = np.arange(6).reshape(2, 3)
np2torch = torch.from_numpy(npdata)
'''
tensor([[0, 1, 2],
        [3, 4, 5]], dtype=torch.int32)
'''
torch2np = np2torch.numpy()

cuda

 if torch.cuda.is_available():
     device = torch.device('cuda')
     # directly create on GPU
     y = torch.ones_like(x, device = device)
     # copy to GPU
     x = x.to(device)
     # or x.to('cuda')
 
     z = x + y
     # tensor([0.1034], device='cuda:0')
     z.to('cpu')

autograd

track and gradient computing

set sometensor.requires_grad True,
to keep track of all the computations.
(enable training)
call .backward() to compute all gradients.
gradient accumulate to .grad attribute.
stop tracking: .detach().
prevent tracking: use code block
with torch.no_grad():.

function

for operation-created tensor,
tensor.grad_fn refer to a function that has
created the tensor.

for user-defined tensor, .grad_fn is None.

backward()

for non-scalar, specify a gradient that is a tensor
of matching shape.

torch.no_grad()

use with torch.no_grad(): when testing the model.

neural network

the typical learning precedure:

define the network, define the learnable params.
iterate over a dataset of inputs.
process the input through the network.
compute the loss.
back-propagate.
update the params.
(weight = weight - learningrate * gradient)

structured construction

layers (no order)

import torch.nn as nn

define in net_class’s __init__()

 class LeNet(nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         self.conv1 = nn.Conv2d(3, 6, 5)
         self.pool = nn.MaxPool2d(2, 2)
         self.conv2 = nn.Conv2d(6, 16, 5)
         self.fc1 = nn.Linear(16 * 5 * 5, 120)
         self.fc2 = nn.Linear(120, 84)
         self.fc3 = nn.Linear(84, 10)

forward propagate structure (ordered)

import torch.nn.functional as F

define in net_class’s forward()

 class LeNet(nn.Module):
     def __init_(self):# layers
     def forward(self, x):
         x = self.conv1(x)
         x = F.relu(x)
         x = self.pool(x)
 
         # write simply with nested structure
         x = self.pool(F.relu(self.conv2(x)))
 
         x = x.view(-1, 16 * 5 * 5)
         # single output/input is/should be row vector
         # -1 is for batchsize
 
         x = F.relu(self.fc1(x))
         x = F.relu(self.fc2(x))
         x = self.fc3(x)
         return x

sequential construction

 net = nn.Sequential(
     nn.Linear(2, 10),
     nn.ReLU(),# btw, this ReLU is a class
     nn.Linear(10, 2)
 )

data load

transforms.ToTensor <-> transforms.ToPILImage()

 import torch.utils.data as Data
 mydataset = Data.TensorDataset(data_tensor = x, target_tensor = y)
 mydataloader = Data.DataLoader(
     dataset = mydataset,
     batch_size = BATCH_SIZE,
     shuffle = True,
     num_workers = 2
 )

torchvision

 transform = transforms.Compose(
     [transforms.ToTensor(),
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 
 trainset = torchvision.datasets.CIFAR10(root = './data', train = True,
                                         download = True, transform = transform)
 trainloader = torch.utils.data.DataLoader(trainset, batch_size = BATCH_SIZE,
                                           shuffle = True, num_workers = 0)
 
 transforms.RandomResizedCrop((height, width))

optimizer

 import torch.optim as optim
 optimizer = optim.SGD(net.parameters(), lr = 0.001, momentum = 0.9)

train

gpu support

 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 print(device)
 net = Net()
 net.to(device)
 '''...'''
 for epoch in range(epochs):
     for i, data in enumerate(trainloader, 0):
         inputs, labels = data[0].to(device), data[1].to(device)

loss function

 import torch.nn as nn
 criterion = nn.CrossEntropyLoss()

train

 for epoch in range(2):
     trainingloss = 0.0
     for i, data in enumerate(trainloader, 0):
         # for gpu support
         inputs, labels = data[0].to(device), data[1].to(device)
         # clear the gradient buffer
         optimizer.zero_grad()
         # forward
         outputs = net(inputs)
         # loss computing
         loss = criterion(outputs, labels)
         # back propagate
         loss.backward()
         # update weights
         optimizer.step()

about step()s

optimizer.step(self, closure = None)

usually used every mini-batch to update the weights.

closure (callable, optional): A closure that reevaluates the model
proceed back-propagation, and returns the loss.

if closure isn’t passed, a backward() should be
proceeded before optimizer.step().

schedular.step()

usually used every epoch to adjust learning rate.

model I/O

method 1 (recommended)

only save the weights, not structure.

needa reconstruct the net when evaluating.

 PATH = './example-model.pth'
 # save
 torch.save(net.state_dict(), PATH)
 # load
 net = Net()# reconstruct the network
 net.load_state_dict(torch.load(PATH))

method 2

save all, but unstable for refactor or transfer usage.

 PATH = './example-model.pth'
 # save
 torch.save(net, PATH)
 # load
 net = torch.load(PATH)

evaluate

 class Net(nn.Module):# copy the structure
 net = Net()
 net.load_state_dict(torch.load(PATH))
 # evaluate
 class_correct = list(0. for i in range(10))# 10 classes example
 class_total = list(0. for i in range(10))
 with torch.no_grad():
     for data in testloader:
         images, labels = data
         outputs = net(images)
         _, predicted = torch.max(outputs, 1)
         c = (predicted == labels).squeeze()
 
         for i in range(4):
             label = labels[i]
             class_correct[label] += c[i].item()
             class_total[label] += 1

models

attributes

modules() -> all the working modules in a network.

pretrained models

torchvision.models

 import torchvision.models as models
 import torchvision.transforms as transforms
 vgg16 = models.vgg16(pretrained = True).eval()
 # all the models use the same normalization
 normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                      std=[0.229, 0.224, 0.225])

sundry

normalization: (with mean and std, x -= mean /= std)

is for making the data centralized, thus making the
distribution normal, so as to bettern the classification
performance.

torch.max(input, dim,) -> (Tensor, LongTensor): - torch.max(a, 0) returns each column’s max value,
then their index.
- torch.max(a, 1) returns each row’s max value,
  then their columns.
torch.nn.functional.softmax(input, dim) -> (Tensor): - softmax(a, 0) change a into Tensor that have all column
sum as 1.
- softmax(a, 1) change a into Tensor that have all row
  sum as 1.

Tensor.squeeze(): squeeze the length 1 dimensions in the Tensor.

 t = torch.Tensor([[1], [2], [3]])
 t.squeeze()
 # tensor([1., 2., 3.])

torch.bmm(batch1, batch2, out = None) -> Tensor: batch-matmul, say batch1.size() = [2, 3, 4],
and batch2.size() = [2, 4, 5],
so the result’s size() would be [2, 3, 5].
torch.unsqueeze(input, dim, output = None) -> Tensor: returns a new tensor with a dimension of size one
inserted at the specified position.

the new Tensor shares the same underlying data with this Tensor.
- positive dim: range from 0 to input.dim().
- negative dim: counting backward.
prediction first, label second: when calling lossfunctions, we should pass predicted and label
in order.
labels are LongTensor (64-bit) by default.
paddings
- nn.ReflectionPad1d(padding) ~ nn.ReflectionPad3d(padding): use the reflection of the opposite boundary to pad.
  - padding is number: pad all directions for the same length.
  - or padding is (left_padding, right_padding).
- nn.ReplicationPad1d(padding) ~ nn.ReplicationPad3d(padding): use the copy of the original boundary to pad.
- nn.ConstantPad1d(padding, value) ~ nn.ConstantPad3d(padding, value): use the same value to pad all directions.
- F.pad(input, pad, mode = 'constant', value = 0)

problem shooting

BrokenPipe Error: encountering this on windows when downloading dataset:
set the num_workers to 0.
TypeError: ‘module’ object is not callable: - maybe it’s your capital letters’ problem.

like datasets.MNIST shouldn’t be datasets.mnist.
Adding softmax layer to CIFAR10-lenet makes the training slower.
"trying to backward multiple times without ‘retained = True’": see if mse_loss’s parameters’ shape don’t match.