【补档:改用CNN】【English】【吴恩达课后编程作业】Course 2 - 改善深层神经网络 - 第三周作业的Pytorch实现

写在前面:跟进课程学习了CNN和LeNet5模型后,尝试用该模型解决上周作业(识别手势数字)遗留问题:1. high bias;2. high variance;3. high data dismatch。 

上周作业指路:【English】【吴恩达课后编程作业】Course 2 - 改善深层神经网络 - 第三周作业的Pytorch实现

⭐Here is the datasets we need to conduct our training, please take time to download them. 

链接:https://pan.baidu.com/s/1N39fsrpw5WkAt9aCzY2WuQ?pwd=996s 
提取码:996s

需要Jupyter notebook笔记可私戳。

Assignment

Basic task: train a NN to recognize 6 gesture numbers(0~5).

train_x(1080×64×64×3): images of different gesture numbers
train_y(1080,1): values range from 0 to 5 indicating the number of correct image. test_x(120×64×64×3)
test_y(120,1)

 

import torch
import numpy as np
import h5py
from torchvision.transforms import ToTensor, Lambda
from torch.utils.data import Dataset, DataLoader
from torch import nn
import cv2
import matplotlib.pyplot as plt
import time
import os

os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

1. Dataset & Dataloader

1.1 Customize our dataset

  • We'll customize our own dataset to load the h5 file into train_dataset and tes_dataset.
  • Customized dataset which subclasses torch.utils.data.Dataset should overwrite the __len__() and __getitem__() methods, as well as __init__()
  • transform / target_transform specifies the feature / label transformation
class MyDataset(Dataset):
    def __init__(self, path, train, transform=None, target_transform=None):
        self.archive = h5py.File(path, 'r')
        self.train = train
        if self.train:
            self.imgs = self.archive['train_set_x'][:]
            self.labels = self.archive['train_set_y'][:]
        else:
            self.imgs = self.archive['test_set_x'][:]
            self.labels = self.archive['test_set_y'][:]

        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        image = self.imgs[idx]
        label = self.labels[idx]
        if self.transform:
            # we need to align the input size to 32×32×3 in order to carry out LeNet
            image = cv2.resize(image, dsize=None, fx=0.5, fy=0.5)
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)

        return image, label

1.2 Convert to one-hot value

To train a classifier of C classes, we need to convert the labels' values (vary between [0, C)) to one-hot values as shown below.

Two means are alternative for this:

y = torch.eye(C)[y, :]

or

y = torch.zeros(C).scatter_(0, torch.tensor(y), value=1)

 

# Transformation for label: convert key value to one_hot value
tgt_transform = Lambda(lambda y: torch.eye(6)[y, :].flatten())

# Initialize train&test datasets with Class MyDataset
train_dataset = MyDataset(
    path=r'./datasets/train_signs.h5',
    train=True,
    transform=ToTensor(),
    target_transform=tgt_transform
)

test_dataset = MyDataset(
    path=r'./datasets/test_signs.h5',
    train=False,
    transform=ToTensor(),
    target_transform=tgt_transform
)

Note: ToTensor() converts the input X of shape [H, W, C] to tensor of shape [C, H, W]. Each value will be normalized as 0~1

# take a look at the dataset
idx = torch.randint(1080,(1,)).item()
img, label = train_dataset[idx]
plt.imshow(img[0])
plt.title(f'The gesture tells: {label.argmax(0)}')
print(label)
tensor([0., 1., 0., 0., 0., 0.])

1.3 Dataloader

train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=120)

Take a look at the dataloader.

imgs, labels = next(iter(train_dataloader))
batch_num = len(train_dataloader)

print(f'Number of batches: {batch_num}')
print(f'Shape of training feature: {imgs.shape}')
print(f'Shape of training labels: {labels.shape}')
Number of batches: 34
Shape of training feature: torch.Size([32, 3, 32, 32])
Shape of training labels: torch.Size([32, 6])

2. Build the NN

Computational graph of LeNet5 is given as below.

The logics inside nn.Linear(in_features, out_features) is:

 𝑍=𝑋∗𝑊+𝑏  ,Size of X:(batch_size, in_features)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.forward_seq = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=0), # no padding by default
            nn.MaxPool2d(kernel_size=2), # stride = kernel_size by default
            nn.Conv2d(6, 16, 5, padding='valid'),  # no padding
            nn.MaxPool2d(2),
            
            nn.Flatten(),
            
            nn.Linear(in_features=400, out_features=120),
            nn.Linear(120, 84),
            nn.Linear(84, 6)
        )

    def forward(self, x):
        logits = self.forward_seq(x)
        return logits
# Initialize my_model and move to GPU for faster cumputation.
my_model = NeuralNetwork().to('cuda')

3. Loss function

nn.CrossEntropyLoss() consists of two main processes:
inputs: pred, y

  •  𝑝𝑟𝑒𝑑=𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝑝𝑟𝑒𝑑) 

  •  𝑙𝑜𝑠𝑠=∑𝑦∗𝑙𝑜𝑔(𝑝𝑟𝑒𝑑) 

So we don't need to implement nn.Softmax(dim=1) in forward sequential.

# Hyperparameters
learning_rate = 0.005
num_epochs = 190


loss_fn = nn.CrossEntropyLoss().to('cuda')

4. Backward propagation

With a simple call of optimizer, grads will be automatially cumputed in backward propagate.Here we call a SDG optimizer(Stochastic Gradient Descent) and pass the parameters.

optimizer = torch.optim.SGD(my_model.parameters(), lr=learning_rate)

5. Full implementation

# A train_loop consists of forward prop, computing loss, backward prop and updating parameters within an epoch.
def train_loop(dataloader, model, loss_function, optimizer):
    size = len(dataloader.dataset)
    for batch_num, (X, y) in enumerate(dataloader):
        X, y = X.to('cuda'), y.to('cuda')

        # Compute predictions and loss
        y_predict = model(X)
        loss = loss_function(y_predict, y)

        # Backward Propagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return loss.item()


def test_loop(dataloader, model, loss_fn):
    """
    It takes parameters we've trained in training loop and conducts forward prop on test dataset.
    And then it carried out the accuracy on testset within this epoch.
    """
    size = len(dataloader.dataset)

    correct = 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to('cuda'), y.to('cuda')
            predict = model(X)
            max_idx = predict.argmax(dim=1)
            predict_norm = torch.eye(6, device='cuda')[max_idx, :]
            correct += (predict_norm * y).sum()
    correct /= size
    correct *= 100

    return correct
start = time.perf_counter()

losses = []  # record training loss at each iteration
for t in range(num_epochs):
    # For milder oscillation and faster convergence. We implement learning-rate decay
    optimizer.lr = learning_rate * (0.95 ** t)
    loss = train_loop(train_dataloader, my_model, loss_fn, optimizer)
    losses.append(loss)

    # print train/test accuracy each 10 iterations
    if (t + 1) % 10 == 0:
        train_acc = test_loop(train_dataloader, my_model, loss_fn)
        test_acc = test_loop(test_dataloader, my_model, loss_fn)
        print(f'epoch: {t + 1}    loss:{(losses[-1]):>0.4f}')
        print('--------------------------------------------')
        print(f'train accuracy:{train_acc:.2f}%    test accuracy:{test_acc:.2f}%\n')

end = time.perf_counter()

print(f'iteration finished in {int(end - start)} s')

plt.plot(np.arange(1, num_epochs + 1), losses)
plt.xlabel('iteration')
plt.ylabel('loss')
plt.show()
epoch: 10    loss:1.7899
--------------------------------------------
train accuracy:18.33%    test accuracy:17.50%

epoch: 20    loss:1.7806
--------------------------------------------
train accuracy:21.67%    test accuracy:20.83%

epoch: 30    loss:1.7742
--------------------------------------------
train accuracy:33.43%    test accuracy:33.33%

epoch: 40    loss:1.6641
--------------------------------------------
train accuracy:39.81%    test accuracy:38.33%

epoch: 50    loss:1.3119
--------------------------------------------
train accuracy:51.48%    test accuracy:50.00%

epoch: 60    loss:1.3813
--------------------------------------------
train accuracy:56.02%    test accuracy:50.83%

epoch: 70    loss:1.3718
--------------------------------------------
train accuracy:62.31%    test accuracy:61.67%

epoch: 80    loss:0.7707
--------------------------------------------
train accuracy:73.52%    test accuracy:71.67%

epoch: 90    loss:0.6190
--------------------------------------------
train accuracy:72.59%    test accuracy:73.33%

epoch: 100    loss:0.5236
--------------------------------------------
train accuracy:78.52%    test accuracy:75.83%

epoch: 110    loss:0.4711
--------------------------------------------
train accuracy:82.13%    test accuracy:78.33%

epoch: 120    loss:0.3751
--------------------------------------------
train accuracy:81.39%    test accuracy:78.33%

epoch: 130    loss:0.6110
--------------------------------------------
train accuracy:85.93%    test accuracy:80.83%

epoch: 140    loss:0.4460
--------------------------------------------
train accuracy:88.33%    test accuracy:81.67%

epoch: 150    loss:0.3127
--------------------------------------------
train accuracy:92.13%    test accuracy:86.67%

epoch: 160    loss:0.2890
--------------------------------------------
train accuracy:90.56%    test accuracy:85.83%

epoch: 170    loss:0.2704
--------------------------------------------
train accuracy:93.89%    test accuracy:91.67%

epoch: 180    loss:0.1063
--------------------------------------------
train accuracy:96.85%    test accuracy:90.83%

epoch: 190    loss:0.0662
--------------------------------------------
train accuracy:97.41%    test accuracy:92.50%

iteration finished in 43 s

Both training and test accracies are acceptable, Let's move on~

6. Save the model and parameters

PyTorch models store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the torch.save method:

weights = my_model.state_dict()
torch.save(weights, 'model_weights.pth')

# show the items in state_dict()
weights.keys()
odict_keys(['forward_seq.0.weight', 'forward_seq.0.bias', 'forward_seq.2.weight', 'forward_seq.2.bias', 'forward_seq.5.weight', 'forward_seq.5.bias', 'forward_seq.6.weight', 'forward_seq.6.bias', 'forward_seq.7.weight', 'forward_seq.7.bias'])
# load saved weights into my model
my_model.load_state_dict(torch.load('model_weights.pth'))
<All keys matched successfully>

7. Test with my images

Six images of gestures are in the path ./datasets/gestures listed by 0.png ~ 5.png.

def customized_test(path):
    my_model.to('cpu')
    plt.figure(figsize=(10, 10))
    for i in range(6):
        img_path = os.path.join(path, f'{i}.jpg')
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # !!!be careful about this

        img2 = cv2.resize(img, dsize=None, fx=0.5, fy=0.5)
        img2 = ToTensor()(img2)
        img2 = img2.unsqueeze(0) # add a dimenion to match the shapes in my model 

        predict = nn.Softmax(dim=1)(my_model(img2))
        number = predict.argmax().item()
        plt.subplot(1, 6, i+1)
        plt.imshow(img)
        plt.title(f"It's '{number}'\n{'right' if number==(i) else 'wrong'}")
        
# test with the first set of gestures
customized_test('datasets/gestures')

 

# test with the second set of gestures
customized_test('datasets/gestures2')

 

8. Summary

The unsatisfying test results of our randomly-shot photos might due to the low capacity of the original training dataset(180 examples for each category). And tbh I took photos of gesture 4 strictly in accordance with the same angles in training dataset because it's so hard to find a randomly-shot number 4 photo which is successfully recognized.
So this neural network isn't so robust to recognize "4". In order to settle this problem, we might need to enrich our datasets, or shift to another neural network model.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值