利用pytorch 搭建分类网络。

最新推荐文章于 2024-05-15 21:16:10 发布

dongdonglele521

最新推荐文章于 2024-05-15 21:16:10 发布

阅读量941

点赞数

本文链接：https://blog.csdn.net/dongdonglele521/article/details/111028881

版权

接触深度学习有一段时间，总感觉里面有些知识理解起来比较困难。想要跑一遍模型搭建、训练、测试、以及部署，本篇博客，先从模型搭建说起，里面有错误的地方还请各位更正。

打算利用pytorch框架完成吴恩达老师的课后作业--手势识别。

图1

目前根据自己的理解，按照以下步骤进行实现：

1、数据收集。数据题目已经给出。数据获取百度云盘连接

def load_dataset():
train_dataset = h5py.File('datasets/train_signs.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_signs.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return , train_set_y_orig, train_set_y_orig，test_set_x_orig, test_set_y_orig, classes

这个小函数是从作业要求里面摘取出来的。

训练数据的输入，train_set_x_orig 维度是(1080, 64, 64, 3)，train_set_y_orig维度是(1,1080)

训练数据的输出 ,test_set_x_orig 维度上(120,64,64,3),test_set_y_orig维度是(1,120)

2、数据加载。

模型训练是分批次进行训练，每次加载指定的数据量进行训练，硬件设备可能无法做到一次加载全部数据。

def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
"""
Creates a list of random minibatches from (X, Y)
Arguments:
X -- input data, of shape (input size, number of examples) (m, Hi, Wi, Ci)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) (m, n_y)
mini_batch_size - size of the mini-batches, integer
seed -- this is only for the purpose of grading, so that you're "random minibatches are the same as ours.
Returns:
mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
"""
m = X.shape[0] # number of training examples
mini_batches = []
np.random.seed(seed)
# Step 1: Shuffle (X, Y)
permutation = list(np.random.permutation(m))
shuffled_X = X[permutation, :, :, :]
shuffled_Y = Y[permutation, :]
# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
for k in range(0, num_complete_minibatches):
mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
# Handling the end case (last mini-batch < mini_batch_size)
if m % mini_batch_size != 0:
mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
return mini_batches

这个小函数是从作业要求里面摘取出来的。

后来发现pytorch有自带的处理数据的类Dataset，继承这个类就可以实现数据和标签加载，自己写了一个MyData类。

class MyData(Dataset):
def __init__(self, traindata, transform=None, train_val="train"):
super(MyData, self).__init__()
self.data = traindata
self.imagenames = glob.glob(self.data +"/*/*.jpg")
self.data_transform = transform
self.train_val = train_val

def __len__(self):
return len(self.imagenames)

def __getitem__(self, item):
img_path = self.imagenames[item]
#img = Image.open(img_path)
img = cv2.imread(img_path)
img_path = eval(repr(img_path).replace("/", '\\'))
label = img_path.split('\\')[-2]
label = int(label)

if self.data_transform is not None:
try:
img = self.data_transform[self.train_val](img)

except:
print('can not load image :{}'.format(img_path))

return torch.from_numpy(img), label

下面是如何使用这个类，首先实例化一下这个类，然后利用torch.utils.data.DataLoader 实现数据分批次加载。各参数说明如下

注意：num_workers 指的是用多少个线程加载数据，0是只在主线程中加载数据。pin_memroy 锁页内存，为true会把数据加载到cuda的锁页内存里面。出现下面的错误原因，因为自己的内存不够

此时将num_workers=0 pin_memory=False 即可解决。自己将本地运行的所有程序关掉，可以调大num_workers 并将pin_memory=True。

解决方法：

num_workers =0

dataset = MyData()
traindataloader = torch.utils.data.DataLoader(dataset,
batch_size=8,
num_workers=0,
shuffle=True,
pin_memory=True)

nb = len(traindataloader)
pbar = tqdm(enumerate(traindataloader), total=nb)
for i, data in pbar:
images, label = data

3、模型搭建

模型搭建的类采用继承torch.nn.Module实现。如果使用GPU训练，把模型放入到GPU中。

gesR = GesRecognition().to(device)

4、模型训练

1、选择好优化器，采用什么样的优化算法。

optimizer = optim.SGD(gesR.parameters(), lr=0.001)

2、设置好训练迭代次数,并将模型设置为训练模式。

gesR.train()

关于为什么在训练时采用train，在测试时采用eval ,参考了https://blog.csdn.net/weixin_44760744/article/details/108929528这篇博客，本篇博客没有采用BN和dropout模块。

在模型每次推理前，将梯度参数清零。这部分内容有待研究。

optimizer.zero_grad()

3、loss 函数的选取

分类任务一般采用交叉熵损失函数，这部分内容有待研究。 loss = criterion(output, train_label.squeeze().long()) 之前认为网络输出和label 只要维度一致就可以了，结果出现了很多错误。

这个错误需要把train_label=train_label.long()

我把batch_size 设置为8,六个类别 output.shape 为【8,6】

train_label.shape 为【8】

原因是torch 求loss值函数内部会把标签文件转换成 one hot 格式的。

output

4、反向传播 loss.backward()

5、迭代器优化更新。optimizer.step()

6、模型保存。torch.save()

程序就是如下代码，简单的设计了一个分类模型，训练了一上午的时间，在测试集上准确率为90%.。

import numpy as np
import torch
import torch.nn as nn
import h5py
import math
from torch import optim
from torch.autograd import Variable
import torch.nn.functional as F
import matplotlib.pyplot as plt

def load_dataset():
train_dataset = h5py.File('datasets/train_signs.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_signs.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

class GesRecognition(nn.Module):
def __init__(self):
super(GesRecognition, self).__init__()
self.conv1 = nn.Conv2d(3, 8, kernel_size=5, stride=1, padding=2)
self.relu1 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(8, 16, kernel_size=3, stride=2)
self.relu2 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size=2)
self.fc = nn.Linear(784, 6)

def forward(self, x):
x = self.conv1(x)
x = self.relu1(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.maxpool1(x)

全连接层矩阵转换为向量
x = x.view(x.size(0), -1)
x = self.fc(x)
return x

def train():
torch.cuda.set_device(0)
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes = load_dataset()
Y = np.eye(6)[train_set_y_orig.reshape(-1)]
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
gesR = GesRecognition().to(device)

train_set_x_orig = train_set_x_orig/255.0
criterion = nn.CrossEntropyLoss()
mini_batch_size = 8
optimizer = optim.SGD(gesR.parameters(), lr=0.001)
for num in range(1000):
gesR.train()
minbatch = random_mini_batches(train_set_x_orig, train_set_y_orig.T, mini_batch_size, seed=num)
for batch in range(len(minbatch)):
train_data = torch.from_numpy(np.transpose(minbatch[batch][0], (0, 3, 1, 2)))
train_data = train_data.float().to(device)
train_data = Variable(train_data)
train_label = torch.from_numpy(minbatch[batch][1]).float().to(device)
train_label = Variable(train_label)
optimizer.zero_grad()
output = gesR(train_data)

loss = criterion(output, train_label.squeeze().long())
if (num % 10 == 0):
cost = loss.cpu()
print("loss is :", cost)
loss = loss.requires_grad_()
loss.backward()
optimizer.step()
if(loss <0.1):
torch.save(gesR, "Gesfy.pkl")
if (num % 10==0):
acc = 0
minbatchtest = random_mini_batches(test_set_x_orig, test_set_y_orig.T, 10, seed=num)
test_data = torch.from_numpy(np.transpose(minbatchtest[0][0], (0, 3, 1, 2)))
test_data = test_data.float().to(device)
test_data = Variable(test_data)
test_label = torch.from_numpy(minbatchtest[0][1]).float().to(device)
test_label = Variable(test_label)
out = gesR(test_data)
out = out.max(1)[1]
print("out.shape", out.shape)
print("test_label", test_label)
for i in range(10):
if(out[i] == test_label.squeeze().long()[i]):
acc = acc+1
acc = acc/10.0

def test():
torch.cuda.set_device(0)
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes = load_dataset()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#gesR = GesRecognition().to(device)
model = torch.load("Gesfy.pkl").to(device)
model.eval()
print(model)
test_set_x_orig = test_set_x_orig/255.0
count = 0
for i in range(test_set_x_orig.shape[0]):
print(type(test_set_x_orig[i]))
plt.imshow(test_set_x_orig[i])
plt.pause(5)
print("label", test_set_y_orig.squeeze()[i])
test_data = torch.from_numpy(np.transpose(test_set_x_orig[i:i+1], (0, 3, 1, 2)))
test_data = test_data.float().to(device)
test_data = Variable(test_data)
out = model(test_data)
out = out.max(1)[1].to("cpu")
print("out is :", out)
plt.show()

if(out == test_set_y_orig.squeeze()[i]):
count = count+1
count = count/test_set_x_orig.shape[0]

print("right rate is :", count)

if __name__ == "__main__":
train()

dongdonglele521

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
利用pytorch 搭建分类网络。

接触深度学习有一段时间，总感觉里面有些知识理解起来比较困难。想要跑一遍模型搭建、训练、测试、以及部署，本篇博客，先从模型搭建说起，里面有错误的地方还请各位更正。打算利用pytorch框架完成吴恩达老师的课后作业--手势识别。图1目前根据自己的理解，按照以下步骤进行实现：...
复制链接

扫一扫