【补档：改用CNN】【English】【吴恩达课后编程作业】Course 2 - 改善深层神经网络 - 第三周作业的Pytorch实现

火卫二德莫斯

于 2023-07-15 02:16:51 发布

阅读量114

点赞数

分类专栏：深度学习文章标签：神经网络 cnn pytorch

本文链接：https://blog.csdn.net/weixin_47967519/article/details/131733945

版权

深度学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

写在前面：跟进课程学习了CNN和LeNet5模型后，尝试用该模型解决上周作业(识别手势数字)遗留问题：1. high bias；2. high variance；3. high data dismatch。

上周作业指路：【English】【吴恩达课后编程作业】Course 2 - 改善深层神经网络 - 第三周作业的Pytorch实现

⭐Here is the datasets we need to conduct our training, please take time to download them.

链接：https://pan.baidu.com/s/1N39fsrpw5WkAt9aCzY2WuQ?pwd=996s
提取码：996s

需要Jupyter notebook笔记可私戳。

Assignment

Basic task: train a NN to recognize 6 gesture numbers(0~5).

train_x(1080×64×64×3): images of different gesture numbers
train_y(1080,1): values range from 0 to 5 indicating the number of correct image. test_x(120×64×64×3)
test_y(120,1)

import torch
import numpy as np
import h5py
from torchvision.transforms import ToTensor, Lambda
from torch.utils.data import Dataset, DataLoader
from torch import nn
import cv2
import matplotlib.pyplot as plt
import time
import os

os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

1. Dataset & Dataloader

1.1 Customize our dataset

We'll customize our own dataset to load the h5 file into train_dataset and tes_dataset.
Customized dataset which subclasses torch.utils.data.Dataset should overwrite the __len__() and __getitem__() methods, as well as __init__()
transform / target_transform specifies the feature / label transformation

class MyDataset(Dataset):
    def __init__(self, path, train, transform=None, target_transform=None):
        self.archive = h5py.File(path, 'r')
        self.train = train
        if self.train:
            self.imgs = self.archive['train_set_x'][:]
            self.labels = self.archive['train_set_y'][:]
        else:
            self.imgs = self.archive['test_set_x'][:]
            self.labels = self.archive['test_set_y'][:]

        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        image = self.imgs[idx]
        label = self.labels[idx]
        if self.transform:
            # we need to align the input size to 32×32×3 in order to carry out LeNet
            image = cv2.resize(image, dsize=None, fx=0.5, fy=0.5)
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)

        return image, label

1.2 Convert to one-hot value

To train a classifier of C classes, we need to convert the labels' values (vary between [0, C)) to one-hot values as shown below.

Two means are alternative for this:

y = torch.eye(C)[y, :]

y = torch.zeros(C).scatter_(0, torch.tensor(y), value=1)

# Transformation for label: convert key value to one_hot value
tgt_transform = Lambda(lambda y: torch.eye(6)[y, :].flatten())

# Initialize train&test datasets with Class MyDataset
train_dataset = MyDataset(
    path=r'./datasets/train_signs.h5',
    train=True,
    transform=ToTensor(),
    target_transform=tgt_transform
)

test_dataset = MyDataset(
    path=r'./datasets/test_signs.h5',
    train=False,
    transform=ToTensor(),
    target_transform=tgt_transform
)

Note: ToTensor() converts the input X of shape [H, W, C] to tensor of shape [C, H, W]. Each value will be normalized as 0~1

# take a look at the dataset
idx = torch.randint(1080,(1,)).item()
img, label = train_dataset[idx]
plt.imshow(img[0])
plt.title(f'The gesture tells: {label.argmax(0)}')
print(label)

tensor([0., 1., 0., 0., 0., 0.])

1.3 Dataloader

train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=120)

Take a look at the dataloader.

imgs, labels = next(iter(train_dataloader))
batch_num = len(train_dataloader)

print(f'Number of batches: {batch_num}')
print(f'Shape of training feature: {imgs.shape}')
print(f'Shape of training labels: {labels.shape}')

Number of batches: 34
Shape of training feature: torch.Size([32, 3, 32, 32])
Shape of training labels: torch.Size([32, 6])

2. Build the NN

Computational graph of LeNet5 is given as below.

The logics inside nn.Linear(in_features, out_features) is:

𝑍=𝑋∗𝑊+𝑏 ,Size of X:(batch_size, in_features)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.forward_seq = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=0), # no padding by default
            nn.MaxPool2d(kernel_size=2), # stride = kernel_size by default
            nn.Conv2d(6, 16, 5, padding='valid'),  # no padding
            nn.MaxPool2d(2),
            
            nn.Flatten(),
            
            nn.Linear(in_features=400, out_features=120),
            nn.Linear(120, 84),
            nn.Linear(84, 6)
        )

    def forward(self, x):
        logits = self.forward_seq(x)
        return logits

# Initialize my_model and move to GPU for faster cumputation.
my_model = NeuralNetwork().to('cuda')

3. Loss function

nn.CrossEntropyLoss() consists of two main processes:
inputs: pred, y

𝑝𝑟𝑒𝑑=𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑝𝑟𝑒𝑑)
𝑙𝑜𝑠𝑠=∑𝑦∗𝑙𝑜𝑔(𝑝𝑟𝑒𝑑)

So we don't need to implement nn.Softmax(dim=1) in forward sequential.

# Hyperparameters
learning_rate = 0.005
num_epochs = 190


loss_fn = nn.CrossEntropyLoss().to('cuda')

4. Backward propagation

With a simple call of optimizer, grads will be automatially cumputed in backward propagate.Here we call a SDG optimizer(Stochastic Gradient Descent) and pass the parameters.

optimizer = torch.optim.SGD(my_model.parameters(), lr=learning_rate)

5. Full implementation

# A train_loop consists of forward prop, computing loss, backward prop and updating parameters within an epoch.
def train_loop(dataloader, model, loss_function, optimizer):
    size = len(dataloader.dataset)
    for batch_num, (X, y) in enumerate(dataloader):
        X, y = X.to('cuda'), y.to('cuda')

        # Compute predictions and loss
        y_predict = model(X)
        loss = loss_function(y_predict, y)

        # Backward Propagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return loss.item()


def test_loop(dataloader, model, loss_fn):
    """
    It takes parameters we've trained in training loop and conducts forward prop on test dataset.
    And then it carried out the accuracy on testset within this epoch.
    """
    size = len(dataloader.dataset)

    correct = 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to('cuda'), y.to('cuda')
            predict = model(X)
            max_idx = predict.argmax(dim=1)
            predict_norm = torch.eye(6, device='cuda')[max_idx, :]
            correct += (predict_norm * y).sum()
    correct /= size
    correct *= 100

    return correct

start = time.perf_counter()

losses = []  # record training loss at each iteration
for t in range(num_epochs):
    # For milder oscillation and faster convergence. We implement learning-rate decay
    optimizer.lr = learning_rate * (0.95 ** t)
    loss = train_loop(train_dataloader, my_model, loss_fn, optimizer)
    losses.append(loss)

    # print train/test accuracy each 10 iterations
    if (t + 1) % 10 == 0:
        train_acc = test_loop(train_dataloader, my_model, loss_fn)
        test_acc = test_loop(test_dataloader, my_model, loss_fn)
        print(f'epoch: {t + 1}    loss:{(losses[-1]):>0.4f}')
        print('--------------------------------------------')
        print(f'train accuracy:{train_acc:.2f}%    test accuracy:{test_acc:.2f}%\n')

end = time.perf_counter()

print(f'iteration finished in {int(end - start)} s')

plt.plot(np.arange(1, num_epochs + 1), losses)
plt.xlabel('iteration')
plt.ylabel('loss')
plt.show()

epoch: 10    loss:1.7899
--------------------------------------------
train accuracy:18.33%    test accuracy:17.50%

epoch: 20    loss:1.7806
--------------------------------------------
train accuracy:21.67%    test accuracy:20.83%

epoch: 30    loss:1.7742
--------------------------------------------
train accuracy:33.43%    test accuracy:33.33%

epoch: 40    loss:1.6641
--------------------------------------------
train accuracy:39.81%    test accuracy:38.33%

epoch: 50    loss:1.3119
--------------------------------------------
train accuracy:51.48%    test accuracy:50.00%

epoch: 60    loss:1.3813
--------------------------------------------
train accuracy:56.02%    test accuracy:50.83%

epoch: 70    loss:1.3718
--------------------------------------------
train accuracy:62.31%    test accuracy:61.67%

epoch: 80    loss:0.7707
--------------------------------------------
train accuracy:73.52%    test accuracy:71.67%

epoch: 90    loss:0.6190
--------------------------------------------
train accuracy:72.59%    test accuracy:73.33%

epoch: 100    loss:0.5236
--------------------------------------------
train accuracy:78.52%    test accuracy:75.83%

epoch: 110    loss:0.4711
--------------------------------------------
train accuracy:82.13%    test accuracy:78.33%

epoch: 120    loss:0.3751
--------------------------------------------
train accuracy:81.39%    test accuracy:78.33%

epoch: 130    loss:0.6110
--------------------------------------------
train accuracy:85.93%    test accuracy:80.83%

epoch: 140    loss:0.4460
--------------------------------------------
train accuracy:88.33%    test accuracy:81.67%

epoch: 150    loss:0.3127
--------------------------------------------
train accuracy:92.13%    test accuracy:86.67%

epoch: 160    loss:0.2890
--------------------------------------------
train accuracy:90.56%    test accuracy:85.83%

epoch: 170    loss:0.2704
--------------------------------------------
train accuracy:93.89%    test accuracy:91.67%

epoch: 180    loss:0.1063
--------------------------------------------
train accuracy:96.85%    test accuracy:90.83%

epoch: 190    loss:0.0662
--------------------------------------------
train accuracy:97.41%    test accuracy:92.50%

iteration finished in 43 s

Both training and test accracies are acceptable, Let's move on~

6. Save the model and parameters

PyTorch models store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the torch.save method:

weights = my_model.state_dict()
torch.save(weights, 'model_weights.pth')

# show the items in state_dict()
weights.keys()

odict_keys(['forward_seq.0.weight', 'forward_seq.0.bias', 'forward_seq.2.weight', 'forward_seq.2.bias', 'forward_seq.5.weight', 'forward_seq.5.bias', 'forward_seq.6.weight', 'forward_seq.6.bias', 'forward_seq.7.weight', 'forward_seq.7.bias'])

# load saved weights into my model
my_model.load_state_dict(torch.load('model_weights.pth'))

<All keys matched successfully>

7. Test with my images

Six images of gestures are in the path ./datasets/gestures listed by 0.png ~ 5.png.

def customized_test(path):
    my_model.to('cpu')
    plt.figure(figsize=(10, 10))
    for i in range(6):
        img_path = os.path.join(path, f'{i}.jpg')
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # !!!be careful about this

        img2 = cv2.resize(img, dsize=None, fx=0.5, fy=0.5)
        img2 = ToTensor()(img2)
        img2 = img2.unsqueeze(0) # add a dimenion to match the shapes in my model 

        predict = nn.Softmax(dim=1)(my_model(img2))
        number = predict.argmax().item()
        plt.subplot(1, 6, i+1)
        plt.imshow(img)
        plt.title(f"It's '{number}'\n{'right' if number==(i) else 'wrong'}")
        
# test with the first set of gestures
customized_test('datasets/gestures')

# test with the second set of gestures
customized_test('datasets/gestures2')

8. Summary

The unsatisfying test results of our randomly-shot photos might due to the low capacity of the original training dataset(180 examples for each category). And tbh I took photos of gesture 4 strictly in accordance with the same angles in training dataset because it's so hard to find a randomly-shot number 4 photo which is successfully recognized.
So this neural network isn't so robust to recognize "4". In order to settle this problem, we might need to enrich our datasets, or shift to another neural network model.

火卫二德莫斯

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【补档：改用CNN】【English】【吴恩达课后编程作业】Course 2 - 改善深层神经网络 - 第三周作业的Pytorch实现

跟进课程学习了CNN和LeNet5模型后，尝试用该模型解决上周作业(识别手势数字)遗留问题：1. high bias；2. high variance; 3. data dismatch
复制链接

扫一扫

专栏目录