接触深度学习有一段时间,总感觉里面有些知识理解起来比较困难。想要跑一遍模型搭建、训练、测试、以及部署,本篇博客,先从模型搭建说起,里面有错误的地方还请各位更正。
打算利用pytorch框架完成吴恩达老师的课后作业--手势识别。
图1
目前根据自己的理解,按照以下步骤进行实现:
1、数据收集。数据题目已经给出。数据获取百度云盘连接
def load_dataset():
train_dataset = h5py.File('datasets/train_signs.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_signs.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return , train_set_y_orig, train_set_y_orig,test_set_x_orig, test_set_y_orig, classes
这个小函数是从作业要求里面摘取出来的。
训练数据的输入,train_set_x_orig 维度是(1080, 64, 64, 3),train_set_y_orig维度是(1,1080)
训练数据的输出 ,test_set_x_orig 维度上(120,64,64,3),test_set_y_orig维度是(1,120)
2、数据加载。
模型训练是分批次进行训练,每次加载指定的数据量进行训练,硬件设备可能无法做到一次加载全部数据。
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
"""
Creates a list of random minibatches from (X, Y)
Arguments:
X -- input data, of shape (input size, number of examples) (m, Hi, Wi, Ci)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) (m, n_y)
mini_batch_size - size of the mini-batches, integer
seed -- this is only for the purpose of grading, so that you're "random minibatches are the same as ours.
Returns:
mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
"""
m = X.shape[0] # number of training examples
mini_batches = []
np.random.seed(seed)
# Step 1: Shuffle (X, Y)
permutation = list(np.random.permutation(m))
shuffled_X = X[permutation, :, :, :]
shuffled_Y = Y[permutation, :]
# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
for k in range(0, num_complete_minibatches):
mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
# Handling the end case (last mini-batch < mini_batch_size)
if m % mini_batch_size != 0:
mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
return mini_batches
这个小函数是从作业要求里面摘取出来的。
后来发现pytorch有自带的处理数据的类Dataset,继承这个类就可以实现数据和标签加载,自己写了一个MyData类。
class MyData(Dataset):
def __init__(self, traindata, transform=None, train_val="train"):
super(MyData, self).__init__()
self.data = traindata
self.imagenames = glob.glob(self.data +"/*/*.jpg")
self.data_transform = transform
self.train_val = train_val
def __len__(self):
return len(self.imagenames)
def __getitem__(self, item):
img_path = self.imagenames[item]
#img = Image.open(img_path)
img = cv2.imread(img_path)
img_path = eval(repr(img_path).replace("/", '\\'))
label = img_path.split('\\')[-2]
label = int(label)
if self.data_transform is not None:
try:
img = self.data_transform[self.train_val](img)
except:
print('can not load image :{}'.format(img_path))
return torch.from_numpy(img), label
下面是如何使用这个类,首先实例化一下这个类,然后利用torch.utils.data.DataLoader 实现数据分批次加载。各参数说明如下
注意:num_workers 指的是用多少个线程加载数据,0是只在主线程中加载数据。pin_memroy 锁页内存,为true会把数据加载到cuda的锁页内存里面。出现下面的错误原因,因为自己的内存不够
此时将num_workers=0 pin_memory=False 即可解决。自己将本地运行的所有程序关掉,可以调大num_workers 并将pin_memory=True。
解决方法:
num_workers =0
dataset = MyData()
traindataloader = torch.utils.data.DataLoader(dataset,
batch_size=8,
num_workers=0,
shuffle=True,
pin_memory=True)
nb = len(traindataloader)
pbar = tqdm(enumerate(traindataloader), total=nb)
for i, data in pbar:
images, label = data
3、模型搭建
模型搭建的类采用继承torch.nn.Module实现。如果使用GPU训练,把模型放入到GPU中。
gesR = GesRecognition().to(device)
4、模型训练
1、选择好优化器,采用什么样的优化算法。
optimizer = optim.SGD(gesR.parameters(), lr=0.001)
2、设置好训练迭代次数,并将模型设置为训练模式。
gesR.train()
关于为什么在训练时采用train,在测试时采用eval ,参考了https://blog.csdn.net/weixin_44760744/article/details/108929528这篇博客,本篇博客没有采用BN和dropout模块。
在模型每次推理前,将梯度参数清零。这部分内容有待研究。
optimizer.zero_grad()
3、loss 函数的选取
分类任务一般采用交叉熵损失函数,这部分内容有待研究。 loss = criterion(output, train_label.squeeze().long()) 之前认为网络输出 和label 只要维度一致就可以了,结果出现了很多错误。
这个错误需要把train_label=train_label.long()
我把batch_size 设置为8,六个类别 output.shape 为 【8,6】
train_label.shape 为 【8】
原因是torch 求loss值函数内部会把标签文件转换成 one hot 格式的。
output
4、反向传播 loss.backward()
5、迭代器优化更新。optimizer.step()
6、模型保存。torch.save()
程序就是如下代码,简单的设计了一个分类模型,训练了一上午的时间,在测试集上准确率为90%.。
import numpy as np
import torch
import torch.nn as nn
import h5py
import math
from torch import optim
from torch.autograd import Variable
import torch.nn.functional as F
import matplotlib.pyplot as plt
def load_dataset():
train_dataset = h5py.File('datasets/train_signs.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_signs.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
"""
Creates a list of random minibatches from (X, Y)
Arguments:
X -- input data, of shape (input size, number of examples) (m, Hi, Wi, Ci)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples) (m, n_y)
mini_batch_size - size of the mini-batches, integer
seed -- this is only for the purpose of grading, so that you're "random minibatches are the same as ours.
Returns:
mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
"""
m = X.shape[0] # number of training examples
mini_batches = []
np.random.seed(seed)
# Step 1: Shuffle (X, Y)
permutation = list(np.random.permutation(m))
shuffled_X = X[permutation, :, :, :]
shuffled_Y = Y[permutation, :]
# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
num_complete_minibatches = math.floor(m/mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
for k in range(0, num_complete_minibatches):
mini_batch_X = shuffled_X[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:,:,:]
mini_batch_Y = shuffled_Y[k * mini_batch_size : k * mini_batch_size + mini_batch_size,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
# Handling the end case (last mini-batch < mini_batch_size)
if m % mini_batch_size != 0:
mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size : m,:,:,:]
mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size : m,:]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
return mini_batches
class GesRecognition(nn.Module):
def __init__(self):
super(GesRecognition, self).__init__()
self.conv1 = nn.Conv2d(3, 8, kernel_size=5, stride=1, padding=2)
self.relu1 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(8, 16, kernel_size=3, stride=2)
self.relu2 = nn.ReLU()
self.maxpool1 = nn.MaxPool2d(kernel_size=2)
self.fc = nn.Linear(784, 6)
def forward(self, x):
x = self.conv1(x)
x = self.relu1(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.maxpool1(x)
全连接层矩阵转换为向量
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
def train():
torch.cuda.set_device(0)
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes = load_dataset()
Y = np.eye(6)[train_set_y_orig.reshape(-1)]
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
gesR = GesRecognition().to(device)
train_set_x_orig = train_set_x_orig/255.0
criterion = nn.CrossEntropyLoss()
mini_batch_size = 8
optimizer = optim.SGD(gesR.parameters(), lr=0.001)
for num in range(1000):
gesR.train()
minbatch = random_mini_batches(train_set_x_orig, train_set_y_orig.T, mini_batch_size, seed=num)
for batch in range(len(minbatch)):
train_data = torch.from_numpy(np.transpose(minbatch[batch][0], (0, 3, 1, 2)))
train_data = train_data.float().to(device)
train_data = Variable(train_data)
train_label = torch.from_numpy(minbatch[batch][1]).float().to(device)
train_label = Variable(train_label)
optimizer.zero_grad()
output = gesR(train_data)
loss = criterion(output, train_label.squeeze().long())
if (num % 10 == 0):
cost = loss.cpu()
print("loss is :", cost)
loss = loss.requires_grad_()
loss.backward()
optimizer.step()
if(loss <0.1):
torch.save(gesR, "Gesfy.pkl")
if (num % 10==0):
acc = 0
minbatchtest = random_mini_batches(test_set_x_orig, test_set_y_orig.T, 10, seed=num)
test_data = torch.from_numpy(np.transpose(minbatchtest[0][0], (0, 3, 1, 2)))
test_data = test_data.float().to(device)
test_data = Variable(test_data)
test_label = torch.from_numpy(minbatchtest[0][1]).float().to(device)
test_label = Variable(test_label)
out = gesR(test_data)
out = out.max(1)[1]
print("out.shape", out.shape)
print("test_label", test_label)
for i in range(10):
if(out[i] == test_label.squeeze().long()[i]):
acc = acc+1
acc = acc/10.0
def test():
torch.cuda.set_device(0)
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes = load_dataset()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#gesR = GesRecognition().to(device)
model = torch.load("Gesfy.pkl").to(device)
model.eval()
print(model)
test_set_x_orig = test_set_x_orig/255.0
count = 0
for i in range(test_set_x_orig.shape[0]):
print(type(test_set_x_orig[i]))
plt.imshow(test_set_x_orig[i])
plt.pause(5)
print("label", test_set_y_orig.squeeze()[i])
test_data = torch.from_numpy(np.transpose(test_set_x_orig[i:i+1], (0, 3, 1, 2)))
test_data = test_data.float().to(device)
test_data = Variable(test_data)
out = model(test_data)
out = out.max(1)[1].to("cpu")
print("out is :", out)
plt.show()
if(out == test_set_y_orig.squeeze()[i]):
count = count+1
count = count/test_set_x_orig.shape[0]
print("right rate is :", count)
if __name__ == "__main__":
train()