目录
前言
为了提升分类性能,最近听说的一个想法是将图像降维用于辅助分类。本文主要是将这个想法进行实验复现。
一、主要方法
这次实验用到的VAE+Classifier有两种方式:
- 融合VAE与分类器(以下称之为一次性训练)
- 先训练VAE(降维),再将VAE的编码器用于辅助分类器训练(以下称之为分开训练)。
说实话,主要一开始找到的代码是一次性训练的,原本没有想到分开训练。后来应要求就要分开训练试试。从MNIST数据集在两者的效果来看,差别不大,说明可以先训练好VAE,再设计分类器进行训练。
二、实验过程
开始前先简单说一下我遇到的数据集加载问题。
1.数据加载
为了实现图片通用性,我将MNIST数据集转成了图片(参考代码:将MNIST数据集转换成.jpg图片(python)_Oriental_1024的博客-CSDN博客),数据集路径下的每个子目录的名称即该类别的名称。
对于图像分类数据集,torchvision有自带的好用的ImageFolder模块可用于加载。然后用torch.utils.data.random_split()进行数据集的划分。
编写的dataset.py代码如下:
import os
import shutil
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, datasets
import matplotlib.pyplot as plt
def MyDataDivide(custom_dataset, test_rate = 0.2, val_rate = 0.1, batch_size=8, shuffle=True, num_workers=0):
# custom_dataset = MyDataset()
n = len(custom_dataset)
test_size = int(n*test_rate)
validate_size = int(n * val_rate)
train_size = n - test_size - validate_size
train_dataset, validate_dataset, test_dataset = torch.utils.data.random_split(custom_dataset, [train_size, validate_size, test_size])
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
validate_loader = DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)
# print(len(train_loader))
# print(len(validate_loader))
# print(len(test_loader))
return train_loader, validate_loader, test_loader
if __name__ == '__main__':
RESIZE_SHAPE = (28, 28)
batch_size = 16
num_workers = 8
shuffle = True
img_dir = r"path/to/your/pictures/classify/dataset"
data_transform = transforms.Compose([
transforms.Resize(RESIZE_SHAPE), # 把图片resize
# transforms.CenterCrop(224), # 随机裁剪224*224
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # 标准化
])
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform)
train_loader, validate_loader, test_loader = MyDataDivide(all_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
test_iter = iter(test_loader)
flower_i = 0
for i in range(10):
image, label = next(test_iter) # iter()函数把train_loader变为迭代器,然后调用迭代器的next()方法
sample = image[flower_i].squeeze()
sample = sample.permute((1, 2, 0)).numpy()
sample *= [0.229, 0.224, 0.225]
sample += [0.485, 0.456, 0.406]
sample = np.clip(sample, 0, 1)
plt.imshow(sample)
plt.show()
print(label)
print('Label is: {}'.format(label[flower_i].numpy()))
踩过的坑:
下面采用torch.utils.data.Subset()的方法划分数据集会使得划分出来的数据集的不具备完整的类别,而是按照比较具有不同的类别。比如训练集比例为0.5,用的是MNIST数据集,那么训练集里面就只有0~4这5个类别的数字。
def MyDataDivide(custom_dataset, batch_size=8, shuffle=False, num_workers=0):
# custom_dataset = MyDataset()
n = len(custom_dataset)
train_size = int(n * 0.5)
validate_size = int(n * 0.2)
test_size = n - validate_size - train_size
# train_dataset, validate_dataset, test_dataset = torch.utils.data.random_split(custom_dataset, [train_size, validate_size, test_size])
test_dataset = torch.utils.data.Subset(custom_dataset, range(test_size)) # take first 10%
validate_dataset = torch.utils.data.Subset(custom_dataset, range(test_size, test_size+validate_size))
train_dataset = torch.utils.data.Subset(custom_dataset, range(test_size+validate_size, n))
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
validate_loader = DataLoader(validate_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
# print(len(train_loader))
# print(len(validate_loader))
# print(len(test_loader))
return train_loader, validate_loader, test_loader
2. 一次性训练
2.1 代码
主程序vae_classify.py代码如下,前文的dataset.py文件与其在同一目录下。
# coding:utf-8
import torch
from matplotlib import pyplot as plt
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision import transforms, datasets
from sklearn.metrics import accuracy_score
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" #GPU编号
class VAE_Classifier(nn.Module):
def __init__(self, input_dim, latent_dim, output_dim, classifier_dim):
# latent_dim and classifier are list which length is 4 and 2, respectively.
super(VAE_Classifier, self).__init__()
# encoder
self.encode_layers = nn.Sequential(
nn.Linear(input_dim, latent_dim[0]),
nn.ReLU(),
nn.Linear(latent_dim[0], latent_dim[1]),
nn.ReLU(),
nn.Linear(latent_dim[1], latent_dim[2]),
nn.ReLU(),
)
self.mean = nn.Linear(latent_dim[2], latent_dim[3])
self.log_var = nn.Linear(latent_dim[2], latent_dim[3])
# decoder
self.decode_layers = nn.Sequential(
nn.Linear(latent_dim[3], latent_dim[2]),
nn.ReLU(),
nn.Linear(latent_dim[2], latent_dim[1]),
nn.ReLU(),
nn.Linear(latent_dim[1], latent_dim[0]),
nn.ReLU(),
nn.Linear(latent_dim[0], input_dim),
nn.Sigmoid()
)
# classifier
self.classifier_layers = nn.Sequential(
nn.Linear(latent_dim[3], classifier_dim[0]),
nn.ReLU(),
nn.Linear(classifier_dim[0], classifier_dim[1]),
nn.ReLU(),
nn.Linear(classifier_dim[1], output_dim)
)
def encode(self, x):
fore1 = self.encode_layers(x)
mean = self.mean(fore1)
log_var = self.log_var(fore1)
return mean, log_var
def reparameterization(self, mean, log_var):
sigma = torch.exp(0.5 * log_var)
eps = torch.randn_like(sigma)
return mean + eps * sigma
def decode(self, z):
recon_x = self.decode_layers(z)
return recon_x
def classifier(self, mean):
self.classifier_layers = self.classifier_layers.cuda()
output = self.classifier_layers(mean)
return output
def forward(self, x):
org_size = x.size()
batch_size = org_size[0]
x = x.view((batch_size, -1))
mean, log_var = self.encode(x)
z = self.reparameterization(mean, log_var)
classifier_x = mean
recon_x = self.decode(z).view(org_size)
pred_y = self.classifier(classifier_x)
return z, recon_x, mean, log_var, pred_y
from dataset import MyDataDivide
RESIZE_SHAPE = (28, 28)
batch_size = 64
num_workers = 8
# img_dir = r"/home/hyj/python_files/Generate/VAE/data/flowers"
img_dir = r"/home/hyj/python_files/Generate/VAE/data/MNIST/MNIST_pics/train_images"
input_dim = RESIZE_SHAPE[0]*RESIZE_SHAPE[1]*3 # 1通道
latent_dim = [512, 256, 128, 64]
output_dim = 10
classifier_dim = [64, 32]
# 1 创建并加载数据集 transforms.Resize((1,784)),
data_transform = transforms.Compose([
transforms.Resize(RESIZE_SHAPE), # 把图片resize为256*256
# transforms.CenterCrop(224), # 随机裁剪224*224
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266]) # 标准化
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
# test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform)
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, num_workers = num_workers)
# train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=0)
# test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0)
# 2 添加cuda, 如果有的话
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# 3 设置模型
model = VAE_Classifier(input_dim, latent_dim, output_dim, classifier_dim)
# print(model)
# 测试模型
# input = torch.rand((1, 1, 784))
# output = model(input)
# print(output)
# 4 设置损失函数
recon_loss = lambda recon_x, x: F.binary_cross_entropy(recon_x, x)
kl_loss = lambda mean, log_var: -5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())
classifier_loss = lambda pred_y, y: F.cross_entropy(pred_y, y)
# 5 设置优化器
learning_rate = 1e-3
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# 6 设置训练网络的一些参数
epochs = 400
total_train_step = 0
total_test_step = 0
# 记录train accuracy and test accuracy
train_acc, test_acc = [], []
# 7 开始训练
# 权重保存位置
weight_classifier_dir = "weights/classifier/"
os.makedirs(weight_classifier_dir, exist_ok=True)
# model.to(device)
model = model.cuda()
for epoch in range(epochs):
print("--------epoch: {}------".format(epoch+1))
model.train()
for data in train_data_loader:
optimizer.zero_grad() # 计算梯度之前,把上一次的梯度清零
imgs, label = data
imgs = imgs.cuda()
label = label.cuda()
# print(imgs.shape)
_, recon_data, mean, log_var, pred_y = model(imgs)
loss1 = recon_loss(recon_data, imgs)
loss2 = kl_loss(mean, log_var)
loss3 = classifier_loss(pred_y, label)
loss = loss1 + loss2 + loss3
total_train_step += 1
if total_train_step % 120 == 0:
label = label.cpu().detach().numpy()
pred_y = pred_y.cpu().detach().numpy()
train_acc.append(accuracy_score(label, pred_y.argmax(1)))
loss.backward()
optimizer.step()
print(f"train loss:\t{loss:.4f}, train_accuary:\t{train_acc[-1]:.4f}")
model.eval()
with torch.no_grad():
for data in test_data_loader:
imgs, label = data
imgs = imgs.cuda()
label = label.cuda()
_, recon_data, mean, log_var, pred_y = model(imgs)
loss1 = recon_loss(recon_data, imgs)
loss2 = kl_loss(mean, log_var)
loss3 = classifier_loss(pred_y, label)
loss = loss1 + loss2 + loss3
total_test_step += 1
if total_test_step % 20 == 0:
label = label.cpu().detach().numpy()
pred_y = pred_y.cpu().detach().numpy()
test_acc.append(accuracy_score(label, pred_y.argmax(1)))
if epoch%20 == 0:
weight_save_path = os.path.join(weight_classifier_dir, f"{epoch}.pth")
torch.save(model, weight_save_path)
# print("testing loss: {}".format(loss))
print(f"test loss:\t{loss:.4f}, test_accuary:\t{test_acc[-1]:.4f}")
print(train_acc[-1], test_acc[-1])
plt.plot(train_acc, color="red", linestyle="-", label="train")
plt.plot(test_acc, color="blue", linestyle="--", label="test")
plt.title("Test and Train Accuracy")
plt.xlabel("times")
plt.ylabel("accuracy")
plt.legend(loc="upper right")
plt.show()
2.2 结果
采用MNIST数据集得到的准确率变化如下。准确率在80%左右。这次的batch_size是16,相对少一些,所以训练过程的准确率波动较大。
batch_size设置为64时,准确率波动明显较小。
最近看的CS231n对这个也有说明:
3.分开训练
这里主要是把一次性训练中的代码的分类器拆分出来,后面用到VAE模型再导入。
3.1 代码
代码如下:
# coding:utf-8
import torch
from matplotlib import pyplot as plt
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
# from torchvision.datasets import MNIST
from torchvision import transforms, datasets
from sklearn.metrics import accuracy_score
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0" #GPU编号
# os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
from dataset import MyDataDivide
RESIZE_SHAPE = (28, 28)
batch_size = 64
num_workers = 8
img_dir = r"path/to/data/MNIST/MNIST_pics/train_images"
input_dim = RESIZE_SHAPE[0]*RESIZE_SHAPE[1]*3 # 3通道
latent_dim = [512, 256, 128, 64]
output_dim = 10
classifier_dim = [64, 32, 16, 8] # 新加两层
# 1 创建并加载数据集 transforms.Resize((1,784)),
data_transform = transforms.Compose([
transforms.Resize(RESIZE_SHAPE), # 把图片resize
# transforms.CenterCrop(224), # 随机裁剪224*224
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266]) # 标准化
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
# test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform)
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, shuffle=True, num_workers = num_workers)
# train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=0)
# test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0)
# 2 添加cuda, 如果有的话
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
class VAE(nn.Module):
def __init__(self, input_dim, latent_dim, output_dim):
# latent_dim and classifier are list which length is 4 and 2, respectively.
super(VAE, self).__init__()
# encoder
self.encode_layers = nn.Sequential(
nn.Linear(input_dim, latent_dim[0]),
nn.ReLU(),
nn.Linear(latent_dim[0], latent_dim[1]),
nn.ReLU(),
nn.Linear(latent_dim[1], latent_dim[2]),
nn.ReLU(),
)
self.mean = nn.Linear(latent_dim[2], latent_dim[3])
self.log_var = nn.Linear(latent_dim[2], latent_dim[3])
# decoder
self.decode_layers = nn.Sequential(
nn.Linear(latent_dim[3], latent_dim[2]),
nn.ReLU(),
nn.Linear(latent_dim[2], latent_dim[1]),
nn.ReLU(),
nn.Linear(latent_dim[1], latent_dim[0]),
nn.ReLU(),
nn.Linear(latent_dim[0], input_dim),
nn.Sigmoid()
)
def encode(self, x):
fore1 = self.encode_layers(x)
mean = self.mean(fore1)
log_var = self.log_var(fore1)
mean = mean.cuda() # cuda
log_var = log_var.cuda()
return mean, log_var
def reparameterization(self, mean, log_var):
sigma = torch.exp(0.5 * log_var)
eps = torch.randn_like(sigma)
return mean + eps * sigma
def decode(self, z):
recon_x = self.decode_layers(z)
return recon_x
# def forward(self, x):
# org_size = x.size()
# batch_size = org_size[0]
# x = x.view((batch_size, -1))
# mean, log_var = self.encode(x)
# z = self.reparameterization(mean, log_var)
# classifier_x = mean
# recon_x = self.decode(z).view(org_size)
# pred_y = self.classifier(classifier_x)
# return z, recon_x, mean, log_var, pred_y
def forward(self, x):
org_size = x.size()
batch_size = org_size[0]
x = x.view((batch_size, -1))
mean, log_var = self.encode(x)
z = self.reparameterization(mean, log_var)
# classifier_x = mean
recon_x = self.decode(z).view(org_size)
# pred_y = self.classifier(classifier_x)
return z, recon_x, mean, log_var
class Classifier(nn.Module):
def __init__(self, input_dim, latent_dim, output_dim, classifier_dim):
super(Classifier, self).__init__()
self.classifier_layers = nn.Sequential(
nn.Linear(latent_dim[3], classifier_dim[0]),
nn.ReLU(),
nn.Linear(classifier_dim[0], classifier_dim[1]),
nn.ReLU(),
nn.Linear(classifier_dim[1], classifier_dim[2]),
nn.ReLU(),
nn.Linear(classifier_dim[2], classifier_dim[3]),
nn.ReLU(),
nn.Linear(classifier_dim[3], output_dim)
)
def classifier(self, mean):
# print("mean:", mean.is_cuda)
# mean = mean.cuda()
# print("mean:", mean.is_cuda)
self.classifier_layers = self.classifier_layers.cuda()
output = self.classifier_layers(mean)
# output = output.cpu().detach().numpy()
return output
def forward(self, x, model_vae): # 加入VAE
x = x.cuda()
model_vae = model_vae.cuda()
# print(type(x), x.is_cuda)
org_size = x.size()
batch_size = org_size[0]
x = x.view((batch_size, -1))
mean, log_var = model_vae.encode(x)
# print("mean:", mean.is_cuda)
# mean = mean.cuda()
classifier_x = mean
# print("classifier_x:", classifier_x.is_cuda)
# classifier_x = classifier_x.cuda() # cuda
pred_y = self.classifier(classifier_x)
return pred_y
# 3 设置模型
model_vae = VAE(input_dim, latent_dim, output_dim) # VAE模型
# print(model)
model_classifier = Classifier(input_dim, latent_dim, output_dim, classifier_dim)
# 测试模型
# input = torch.rand((1, 1, 784))
# output = model_vae_vae_vae(input)
# print(output)
# 4 设置损失函数
recon_loss = lambda recon_x, x: F.binary_cross_entropy(recon_x, x)
kl_loss = lambda mean, log_var: -5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())
classifier_loss = lambda pred_y, y: F.cross_entropy(pred_y, y)
# 5 设置优化器
learning_rate = 5e-3
optimizer = optim.Adam(model_vae.parameters(), lr=learning_rate)
# 6 设置训练网络的一些参数
epochs = 100
total_train_step = 0
total_test_step = 0
# 记录train accuracy and test accuracy
loss_vae_train, loss_vae_test = [], []
# 7 开始训练
## 7.1 训练VAE
weight_vae_dir = "weights/vae/"
os.makedirs(weight_vae_dir, exist_ok=True) # 权重
# model_vae.to(device)
model_vae.cuda()
for epoch in range(epochs):
print("--------epoch: {}------".format(epoch+1))
model_vae.train()
for data in train_data_loader:
optimizer.zero_grad() # 计算梯度之前,把上一次的梯度清零
imgs, label = data
imgs = imgs.cuda()
# label = label.cuda()
# print(imgs.shape)
_, recon_data, mean, log_var = model_vae(imgs)
loss1 = recon_loss(recon_data, imgs)
loss2 = kl_loss(mean, log_var)
loss = loss1 + loss2 # VAE相关损失
total_train_step += 1
if total_train_step % 40 == 0:
loss_vae_train.append(loss)
loss.backward()
optimizer.step()
print(f"train loss:\t{loss:.4f}")
model_vae.eval()
with torch.no_grad():
for data in test_data_loader:
imgs, label = data
imgs = imgs.cuda()
_, recon_data, mean, log_var = model_vae(imgs)
loss1 = recon_loss(recon_data, imgs)
loss2 = kl_loss(mean, log_var)
loss = loss1 + loss2
total_test_step += 1
if total_test_step % 20 == 0:
loss_vae_test.append(loss)
if epoch%20 == 0:
weight_save_path = os.path.join(weight_vae_dir, f"{epoch}.pth")
torch.save(model_vae, weight_save_path)
# print("VAE testing loss: {}".format(loss))
print(f"test loss:\t{loss:.4f}")
loss_vae_train = [loss_i.item() for loss_i in loss_vae_train]
loss_vae_test = [loss_i.item() for loss_i in loss_vae_test]
print(loss_vae_train[-1], loss_vae_test[-1])
plt.plot(loss_vae_train, color="red", linestyle="-", label="train")
plt.plot(loss_vae_test, color="blue", linestyle="--", label="test")
plt.title("VAE Test and Train Loss")
plt.xlabel("times")
plt.ylabel("loss")
plt.legend(loc="upper right")
plt.show()
# # 模型加载
# model_path = r"path/to/weights/vae/80.pth"
# model_vae_80 = torch.load(model_path)
# model_vae_80.to(device)
# model_vae_80.eval()
# print(next(model_vae_80.parameters()).device)
# model_vae = model_vae_80
# 分类器训练设置
# 1 创建并加载数据集 transforms.Resize((1,784)),
data_transform = transforms.Compose([
transforms.Resize(RESIZE_SHAPE), # 把图片resize为256*256
# transforms.CenterCrop(224), # 随机裁剪224*224
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266]) # 标准化
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
# test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
batch_size = 64
num_workers = 8
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform)
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, shuffle=True, num_workers = num_workers)
classifier_loss = lambda pred_y, y: F.cross_entropy(pred_y, y)
## 7.2 训练分类器
# 设置优化器(分类器)
learning_rate = 5e-3
optimizer = optim.Adam(model_classifier.parameters(), lr=learning_rate)
# 6 设置训练网络(分类器)的一些参数1
epochs = 300
total_train_step = 0
total_test_step = 0
loss_classifier_train, loss_classifier_test = [], []
train_acc, test_acc = [], []
# 权重保存位置
weight_classifier_dir = "weights/classifier/"
os.makedirs(weight_classifier_dir, exist_ok=True)
# model_vae = model_vae.cuda()
model_vae.eval() # VAE模型为测试模式!!
for epoch in range(epochs):
print("--------epoch: {}------".format(epoch+1))
# 分类器训练模式
model_classifier.train()
for data in train_data_loader:
optimizer.zero_grad() # 计算梯度之前,把上一次的梯度清零
imgs, label = data
imgs = imgs.cuda() # cuda
label = label.cuda()
# print(imgs.shape)
pred_y = model_classifier(imgs, model_vae)
loss = classifier_loss(pred_y, label)
total_train_step += 1
if total_train_step % 120 == 0:
loss_classifier_train.append(loss)
pred_y = pred_y.cpu().detach().numpy()
label = label.cpu().detach().numpy()
train_acc.append(accuracy_score(label, pred_y.argmax(1)))
loss.backward()
optimizer.step()
# print("Classifier train loss: {}".format(loss))
# print("Classifier train accuracy: {}".format(train_acc[-1]))
print(f"train loss:\t{loss:.4f}, train_accuary:\t{train_acc[-1]:.4f}")
# 分类器测试模式
model_classifier.eval()
with torch.no_grad():
for data in test_data_loader:
imgs, label = data
imgs = imgs.cuda() # cuda
label = label.cuda()
pred_y = model_classifier(imgs, model_vae)
loss = classifier_loss(pred_y, label)
total_test_step += 1
if total_test_step % 60 == 0:
loss_classifier_test.append(loss)
pred_y = pred_y.cpu().detach().numpy()
label = label.cpu().detach().numpy()
test_acc.append(accuracy_score(label, pred_y.argmax(1)))
if epoch%20 == 0:
weight_save_path = os.path.join(weight_classifier_dir, f"{epoch}.pth")
torch.save(model_classifier, weight_save_path)
# print("Classifier testing loss: {}".format(loss))
# print("Classifier train accuracy: {}".format(test_acc[-1]))
print(f"test loss:\t{loss:.4f}, test_accuary:\t{test_acc[-1]:.4f}")
# loss_classifier_train = [loss_i.item() for loss_i in loss_classifier_train]
# loss_classifier_test = [loss_i.item() for loss_i in loss_classifier_test]
print(loss_classifier_train[-1], loss_classifier_test[-1])
plt.plot(loss_classifier_train, color="red", linestyle="-", label="train")
plt.plot(loss_classifier_test, color="blue", linestyle="--", label="test")
plt.title("Test and Train Loss")
plt.xlabel("times")
plt.ylabel("loss")
plt.legend(loc="upper right")
plt.show()
print(train_acc[-1], test_acc[-1])
plt.plot(train_acc, color="red", linestyle="-", label="train")
plt.plot(test_acc, color="blue", linestyle="--", label="test")
plt.title("Test and Train Accuracy")
plt.xlabel("times")
plt.ylabel("accuracy")
plt.legend(loc="upper right")
plt.show()
3.2 结果
分开训练的VAE 损失变化
分开训练的分类器损失变化
3.3 踩过的坑
选用MNIST数据集,分开训练,VAE损失在0.2左右没怎么降,分类器如果采用一次性训练中的两层网络损失比较难下降,分类器网络增加了两层线性层后,效果较好,准确率在80%~90%左右。
这个问题困扰了我很久,我就觉得奇怪,无论我怎么调batch_size和learning_rate,loss就是不下降,learning_rate都调到1e-2了还是不下降,只能说另有原因。后面在分类器加了两层之后,效果一下就好起来了!
可能分类器不能太简单,否则比较难训练分类器!!
4. 2023年08月13日【更新】
奇怪的是,上面同一个代码,第二天早上起来运行就不是原来那样了,损失还是下降不了,跟分类器部分线性层的层数没有关系了。
后面对比了最最最初始的代码(以下称为模板代码),开始寻找是不是数据加载的问题。
4.1 ImageFolder式的数据加载方式有没有问题?没有
最开始的加载MNIST数据集是这样加载的
# 1 创建并加载数据集 transforms.Resize((1,784)),
transforms = transforms.Compose([transforms.ToTensor()])
train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=0)
test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0)
它前5个epoch也是损失比较难下降,准确率一直在0.2甚至0.1以下徘徊,但是自从第6个epoch开始突然准确率开始上升到0.3以上,此后就逐步正常上升,12个epoch左右准确率就能上升到0.85左右了。
为了通用性,我使用了普遍的加载方式
from dataset import MyDataDivide
RESIZE_SHAPE = (28, 28)
batch_size = 64
num_workers = 0
img_dir = r"path/to/MNIST_pics/train_images"
data_transform = transforms.Compose([
transforms.Resize(RESIZE_SHAPE), # 把图片resize为256*256
# transforms.CenterCrop(224), # 随机裁剪224*224
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266]) # 标准化
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform)
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, shuffle=True, num_workers = num_workers)
这次是在第10个epoch开始发生转折,此后逐步正常上升。
但之前确实是很奇怪,我还试了cifar100、flowers数据集,特别还先把它们无论是原大小还是resize到与MNIST相同的(28, 28),都是在经过了100个epoch,损失基本不下降,准确率也是在0.2以下。
不知道是不是相比于MNIST,cifar100这样的由于背景多样使得分类更加困难。
分类层加了两层之后还是这个问题……
4.2 num_workers:影响不大
模仿模板代码中num_workers=0,设置num_workers并不会对总体准确率变化趋势有影响,可能会加快转折的速度。比如我设置了num_workers=8后,刚才的这个原本在第10个epoch发生转折的,现在提前到了第7个。
这也说明我这样的数据加载应该是没有问题的。
4.3 数据加载时的标准化:标准化可能是负面影响
我原本考虑是不是这里的标准化对数据训练有影响,但结果是我加了标准化,损失往往反而变得很大,成千上万的那种。所以后面还是不加标准化。
transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266]) # 标准化
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
5. VAE接入ResNet【2023年08月17日更新】
5.1 ResNet分类
众所周知,ResNet的分类性能实在是强。前面屡屡难以突破时,应要求要先保证分类准确率,就找了个通用的分类模型来分类,我选的就是ResNet。实验用的是跟前面的花类数据集差不多的一个数据集,数据量大概是10000张,图片大小是(256, 256),也是分5类。
实验结果我大为震撼,batch_size和learning_rate什么的超参数选个默认的正常值就行,没有什么调的,准确率上升得很快,我记得三四十个epoch,准确率就已经到99%左右了,后面到100个epoch左右的,准确率能到99.59%(2207个测试图片就判对了2197!),2000多张图片里面只有10个没有分对!
我知道ResNet分类性能高,但没有想到能高到这种地步,这分类还什么进步的空间吗?!
然后所以我就不明白,到底最终是要个什么要求。分类效果已经可以说到天花板了,难道是希望降维后分类会再提升吗?是要达到测试集100%的分类准确率吗?
5.2 分开训练的分类器换成ResNet
其实当时调了几天参数准确率总是上不去,以及后来分类器新加了两层后效果明显提升后,我就想直接把分类器换成ResNet了。但是当时没想明白怎么用,因为ResNet的输入是图片,VAE的编码器(降维后)的输出又是一个的形式,out_dim就是要降到的维度数量。
所以,那就考虑把编码器的输出reshape成适合ResNet的输入就好了。给它
(batch_size, out_dim)→(batch_size, C, H, W)
这里的C也就是通道数,还是定成3比较好,模仿图像的3通道。剩下的H = W = sqrt(out_dim/3)就好了。
另外,图像的像素值全为正,tensor形式的又得在[0, 1]之间,我就考虑就sigmoid函数把降维后的数据映射到[0, 1],但映射前要注意降维后的数据分布,像我的降维后的数据特别小1e-8这种量级,直接用sigmoid会全变成0.5000左右,没有区别,建议进行变换再映射到[0, 1]内。
结果非常地不行。表现为训练损失保持在1.6左右不下降,测试损失特别大,大到1e30这样的,准确率自然也就没得看了。我认为是ResNet分类性能太好了,而且我一开始选的降到的维数是48,,只能说极其严重的过拟合。
然后我把降到的维数改成768,也就是相当于(3, 16, 16)这样的小图片,训练损失虽然变化不在,但是测试损失到了1e15这个量级,也算是好不少了。应该是可以说明ResNet对小图片这样的,维数少的因为太好分类所以严重过拟合。降到的维度还能再大吗,不能了,受显存限制。另外降到的维度大也就要没有什么降维的意思了。
6. VAE的编码器和解码器换为ResNet结构【2023年08月23日】
程序是跑起来了,但是暂时还没找到原因,一直以来存在的问题是,损失不下降。这个模型它就没有学到东西。
现在准确率到什么地步,20%,本来就是在分5类,等于说这个模型它在猜……
总结
自从2023年08月04日搞了好多天,走了好多弯路,昨天晚上在分类器加两层线性层后才算有所突破,特此记录一下。