VAE+Classifier实验心得

代码参考自:深度学习完整的模型训练(以VAE+Classifier为例) - 知乎 (zhihu.com)

目录

前言

一、主要方法

二、实验过程

1.数据加载

2. 一次性训练

2.1 代码

2.2 结果 

3.分开训练

3.1 代码

3.2 结果 

3.3 踩过的坑

总结


前言

为了提升分类性能,最近听说的一个想法是将图像降维用于辅助分类。本文主要是将这个想法进行实验复现。


一、主要方法

这次实验用到的VAE+Classifier有两种方式:

  1. 融合VAE与分类器(以下称之为一次性训练
  2. 先训练VAE(降维),再将VAE的编码器用于辅助分类器训练(以下称之为分开训练)。

说实话,主要一开始找到的代码是一次性训练的,原本没有想到分开训练。后来应要求就要分开训练试试。从MNIST数据集在两者的效果来看,差别不大,说明可以先训练好VAE,再设计分类器进行训练。

二、实验过程

开始前先简单说一下我遇到的数据集加载问题。

1.数据加载

为了实现图片通用性,我将MNIST数据集转成了图片(参考代码:将MNIST数据集转换成.jpg图片(python)_Oriental_1024的博客-CSDN博客),数据集路径下的每个子目录的名称即该类别的名称。

对于图像分类数据集,torchvision有自带的好用的ImageFolder模块可用于加载。然后用torch.utils.data.random_split()进行数据集的划分。

编写的dataset.py代码如下:

import os
import shutil
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, datasets
import matplotlib.pyplot as plt


def MyDataDivide(custom_dataset, test_rate = 0.2, val_rate = 0.1, batch_size=8, shuffle=True, num_workers=0):
    # custom_dataset = MyDataset()
    n = len(custom_dataset)
    
    test_size = int(n*test_rate)
    validate_size = int(n * val_rate)
    train_size =  n - test_size - validate_size
    
    
    train_dataset, validate_dataset, test_dataset = torch.utils.data.random_split(custom_dataset, [train_size, validate_size, test_size])


    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
    validate_loader = DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    # print(len(train_loader))
    # print(len(validate_loader))
    # print(len(test_loader))

    return train_loader, validate_loader, test_loader

    
if __name__ == '__main__':
    RESIZE_SHAPE = (28, 28)
    batch_size = 16
    num_workers = 8
    shuffle = True
    img_dir = r"path/to/your/pictures/classify/dataset"
    
    data_transform = transforms.Compose([
        transforms.Resize(RESIZE_SHAPE),         # 把图片resize
        # transforms.CenterCrop(224),     # 随机裁剪224*224
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])   # 标准化
    ])
    all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform) 

    train_loader, validate_loader, test_loader = MyDataDivide(all_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)

    test_iter = iter(test_loader)
    flower_i = 0
    for i in range(10):
        image, label = next(test_iter)  # iter()函数把train_loader变为迭代器,然后调用迭代器的next()方法
        sample = image[flower_i].squeeze()
        sample = sample.permute((1, 2, 0)).numpy()
        sample *= [0.229, 0.224, 0.225]
        sample += [0.485, 0.456, 0.406]
        sample = np.clip(sample, 0, 1)
        plt.imshow(sample)
        plt.show()
        print(label)
        print('Label is: {}'.format(label[flower_i].numpy()))

踩过的坑:

下面采用torch.utils.data.Subset()的方法划分数据集会使得划分出来的数据集的不具备完整的类别,而是按照比较具有不同的类别。比如训练集比例为0.5,用的是MNIST数据集,那么训练集里面就只有0~4这5个类别的数字。

def MyDataDivide(custom_dataset, batch_size=8, shuffle=False, num_workers=0):
    # custom_dataset = MyDataset()
    n = len(custom_dataset)
    train_size = int(n * 0.5)
    validate_size = int(n * 0.2)
    test_size = n - validate_size - train_size
    # train_dataset, validate_dataset, test_dataset = torch.utils.data.random_split(custom_dataset, [train_size, validate_size, test_size])

    test_dataset = torch.utils.data.Subset(custom_dataset, range(test_size))  # take first 10%
    validate_dataset = torch.utils.data.Subset(custom_dataset, range(test_size, test_size+validate_size))
    train_dataset = torch.utils.data.Subset(custom_dataset, range(test_size+validate_size, n))  

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
    validate_loader = DataLoader(validate_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
 
    # print(len(train_loader))
    # print(len(validate_loader))
    # print(len(test_loader))

    return train_loader, validate_loader, test_loader

2. 一次性训练

2.1 代码

主程序vae_classify.py代码如下,前文的dataset.py文件与其在同一目录下。

# coding:utf-8

import torch
from matplotlib import pyplot as plt
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision import transforms, datasets
from sklearn.metrics import accuracy_score
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" #GPU编号

class VAE_Classifier(nn.Module):
    def __init__(self, input_dim, latent_dim, output_dim, classifier_dim):
        # latent_dim and classifier are list which length is 4 and 2, respectively.
        super(VAE_Classifier, self).__init__()
        # encoder
        self.encode_layers = nn.Sequential(
            nn.Linear(input_dim, latent_dim[0]),
            nn.ReLU(),
            nn.Linear(latent_dim[0], latent_dim[1]),
            nn.ReLU(),
            nn.Linear(latent_dim[1], latent_dim[2]),
            nn.ReLU(),
        )
        self.mean = nn.Linear(latent_dim[2], latent_dim[3])
        self.log_var = nn.Linear(latent_dim[2], latent_dim[3])
        # decoder
        self.decode_layers = nn.Sequential(
            nn.Linear(latent_dim[3], latent_dim[2]),
            nn.ReLU(),
            nn.Linear(latent_dim[2], latent_dim[1]),
            nn.ReLU(),
            nn.Linear(latent_dim[1], latent_dim[0]),
            nn.ReLU(),
            nn.Linear(latent_dim[0], input_dim),
            nn.Sigmoid()
        )
        # classifier
        self.classifier_layers = nn.Sequential(
            nn.Linear(latent_dim[3], classifier_dim[0]),
            nn.ReLU(),
            nn.Linear(classifier_dim[0], classifier_dim[1]),
            nn.ReLU(),
            nn.Linear(classifier_dim[1], output_dim)
        )
 
    def encode(self, x):
        fore1 = self.encode_layers(x)
        mean = self.mean(fore1)
        log_var = self.log_var(fore1)
        return mean, log_var

    def reparameterization(self, mean, log_var):
        sigma = torch.exp(0.5 * log_var)
        eps = torch.randn_like(sigma)
        return mean + eps * sigma

    def decode(self, z):
        recon_x = self.decode_layers(z)
        return recon_x

    def classifier(self, mean):
        self.classifier_layers = self.classifier_layers.cuda()
        output = self.classifier_layers(mean)
        return output

    def forward(self, x):
        org_size = x.size()
        batch_size = org_size[0]
        x = x.view((batch_size, -1))
        mean, log_var = self.encode(x)
        z = self.reparameterization(mean, log_var)
        classifier_x = mean
        recon_x = self.decode(z).view(org_size)
        pred_y = self.classifier(classifier_x)
        return z, recon_x, mean, log_var, pred_y


from dataset import MyDataDivide

RESIZE_SHAPE = (28, 28)
batch_size = 64
num_workers = 8
# img_dir = r"/home/hyj/python_files/Generate/VAE/data/flowers"
img_dir = r"/home/hyj/python_files/Generate/VAE/data/MNIST/MNIST_pics/train_images"

input_dim = RESIZE_SHAPE[0]*RESIZE_SHAPE[1]*3   # 1通道
latent_dim = [512, 256, 128, 64]
output_dim = 10
classifier_dim = [64, 32]



# 1 创建并加载数据集 transforms.Resize((1,784)),
data_transform = transforms.Compose([
    transforms.Resize(RESIZE_SHAPE),         # 把图片resize为256*256
    # transforms.CenterCrop(224),     # 随机裁剪224*224
    transforms.ToTensor(),
    # transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266])   # 标准化
    # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
# test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform) 
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, num_workers = num_workers)

# train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=0)
# test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0)

# 2 添加cuda, 如果有的话
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 3 设置模型
model = VAE_Classifier(input_dim, latent_dim, output_dim, classifier_dim)
# print(model)

# 测试模型
# input = torch.rand((1, 1, 784))
# output = model(input)
# print(output)

# 4 设置损失函数
recon_loss = lambda recon_x, x: F.binary_cross_entropy(recon_x, x)
kl_loss = lambda mean, log_var: -5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())
classifier_loss = lambda pred_y, y: F.cross_entropy(pred_y, y)



# 5 设置优化器
learning_rate = 1e-3
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# 6 设置训练网络的一些参数
epochs = 400
total_train_step = 0
total_test_step = 0

# 记录train accuracy and test accuracy
train_acc, test_acc = [], []

# 7 开始训练
# 权重保存位置
weight_classifier_dir = "weights/classifier/"
os.makedirs(weight_classifier_dir, exist_ok=True)
# model.to(device)
model = model.cuda()
for epoch in range(epochs):
    print("--------epoch: {}------".format(epoch+1))
    model.train()
    for data in train_data_loader:
        optimizer.zero_grad()   # 计算梯度之前,把上一次的梯度清零
        imgs, label = data
        imgs = imgs.cuda()
        label = label.cuda()
        # print(imgs.shape)
        _, recon_data, mean, log_var, pred_y = model(imgs)
        loss1 = recon_loss(recon_data, imgs)
        loss2 = kl_loss(mean, log_var)
        loss3 = classifier_loss(pred_y, label)
        loss = loss1 + loss2 + loss3
        total_train_step += 1
        if total_train_step % 120 == 0:
            label = label.cpu().detach().numpy()
            pred_y = pred_y.cpu().detach().numpy()
            train_acc.append(accuracy_score(label, pred_y.argmax(1)))
        loss.backward()

        optimizer.step()
    print(f"train loss:\t{loss:.4f}, train_accuary:\t{train_acc[-1]:.4f}")

    model.eval()
    with torch.no_grad():
        for data in test_data_loader:
            imgs, label = data
            imgs = imgs.cuda()
            label = label.cuda()
            _, recon_data, mean, log_var, pred_y = model(imgs)
            loss1 = recon_loss(recon_data, imgs)
            loss2 = kl_loss(mean, log_var)
            loss3 = classifier_loss(pred_y, label)
            loss = loss1 + loss2 + loss3
            total_test_step += 1
            if total_test_step % 20 == 0:
                label = label.cpu().detach().numpy()
                pred_y = pred_y.cpu().detach().numpy()

                test_acc.append(accuracy_score(label, pred_y.argmax(1)))
        if epoch%20 == 0:
            weight_save_path = os.path.join(weight_classifier_dir, f"{epoch}.pth")
            torch.save(model, weight_save_path)
        # print("testing loss: {}".format(loss))
        print(f"test loss:\t{loss:.4f}, test_accuary:\t{test_acc[-1]:.4f}")

print(train_acc[-1], test_acc[-1])
plt.plot(train_acc, color="red", linestyle="-", label="train")
plt.plot(test_acc, color="blue", linestyle="--", label="test")
plt.title("Test and Train Accuracy")
plt.xlabel("times")
plt.ylabel("accuracy")
plt.legend(loc="upper right")
plt.show()





2.2 结果 

采用MNIST数据集得到的准确率变化如下。准确率在80%左右。这次的batch_size是16,相对少一些,所以训练过程的准确率波动较大。

 batch_size设置为64时,准确率波动明显较小。

最近看的CS231n对这个也有说明:

3.分开训练

这里主要是把一次性训练中的代码的分类器拆分出来,后面用到VAE模型再导入。

3.1 代码

代码如下: 

# coding:utf-8

import torch
from matplotlib import pyplot as plt
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
# from torchvision.datasets import MNIST
from torchvision import transforms, datasets
from sklearn.metrics import accuracy_score
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0" #GPU编号
# os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

from dataset import MyDataDivide

RESIZE_SHAPE = (28, 28)
batch_size = 64
num_workers = 8
img_dir = r"path/to/data/MNIST/MNIST_pics/train_images"

input_dim = RESIZE_SHAPE[0]*RESIZE_SHAPE[1]*3   # 3通道
latent_dim = [512, 256, 128, 64]
output_dim = 10  
classifier_dim = [64, 32, 16, 8]    # 新加两层


# 1 创建并加载数据集 transforms.Resize((1,784)),
data_transform = transforms.Compose([
    transforms.Resize(RESIZE_SHAPE),         # 把图片resize
    # transforms.CenterCrop(224),     # 随机裁剪224*224
    transforms.ToTensor(),
    # transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266])   # 标准化
    # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
# test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform) 
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, shuffle=True, num_workers = num_workers)

# train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=0)
# test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0)


# 2 添加cuda, 如果有的话
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class VAE(nn.Module):
    def __init__(self, input_dim, latent_dim, output_dim):
        # latent_dim and classifier are list which length is 4 and 2, respectively.
        super(VAE, self).__init__()
        # encoder
        self.encode_layers = nn.Sequential(
            nn.Linear(input_dim, latent_dim[0]),
            nn.ReLU(),
            nn.Linear(latent_dim[0], latent_dim[1]),
            nn.ReLU(),
            nn.Linear(latent_dim[1], latent_dim[2]),
            nn.ReLU(),
        )
        self.mean = nn.Linear(latent_dim[2], latent_dim[3])
        self.log_var = nn.Linear(latent_dim[2], latent_dim[3])
        # decoder
        self.decode_layers = nn.Sequential(
            nn.Linear(latent_dim[3], latent_dim[2]),
            nn.ReLU(),
            nn.Linear(latent_dim[2], latent_dim[1]),
            nn.ReLU(),
            nn.Linear(latent_dim[1], latent_dim[0]),
            nn.ReLU(),
            nn.Linear(latent_dim[0], input_dim),
            nn.Sigmoid()
        )
 
    def encode(self, x):
        fore1 = self.encode_layers(x)
        mean = self.mean(fore1)
        log_var = self.log_var(fore1)
        
        mean = mean.cuda()  # cuda
        log_var = log_var.cuda()
        return mean, log_var

    def reparameterization(self, mean, log_var):
        sigma = torch.exp(0.5 * log_var)
        eps = torch.randn_like(sigma)
        return mean + eps * sigma

    def decode(self, z):
        recon_x = self.decode_layers(z)
        return recon_x


    # def forward(self, x):
    #     org_size = x.size()
    #     batch_size = org_size[0]
    #     x = x.view((batch_size, -1))
    #     mean, log_var = self.encode(x)
    #     z = self.reparameterization(mean, log_var)
    #     classifier_x = mean
    #     recon_x = self.decode(z).view(org_size)
    #     pred_y = self.classifier(classifier_x)
    #     return z, recon_x, mean, log_var, pred_y
    
    def forward(self, x):
        org_size = x.size()
        batch_size = org_size[0]
        x = x.view((batch_size, -1))
        mean, log_var = self.encode(x)
        z = self.reparameterization(mean, log_var)
        # classifier_x = mean
        recon_x = self.decode(z).view(org_size)
        # pred_y = self.classifier(classifier_x)
        return z, recon_x, mean, log_var

class Classifier(nn.Module):
    def __init__(self, input_dim, latent_dim, output_dim, classifier_dim):
        super(Classifier, self).__init__()

        self.classifier_layers = nn.Sequential(
            nn.Linear(latent_dim[3], classifier_dim[0]),
            nn.ReLU(),
            nn.Linear(classifier_dim[0], classifier_dim[1]),
            nn.ReLU(),
            nn.Linear(classifier_dim[1], classifier_dim[2]),
            nn.ReLU(),
            nn.Linear(classifier_dim[2], classifier_dim[3]),
            nn.ReLU(),
            nn.Linear(classifier_dim[3], output_dim)
        )
    
    def classifier(self, mean):
        # print("mean:", mean.is_cuda)
        # mean = mean.cuda()
        # print("mean:", mean.is_cuda)
        self.classifier_layers = self.classifier_layers.cuda()
        output = self.classifier_layers(mean)
        # output = output.cpu().detach().numpy()
        return output
    
    def forward(self, x, model_vae):    # 加入VAE
        x = x.cuda()
        model_vae = model_vae.cuda()
        # print(type(x), x.is_cuda)
    
        org_size = x.size()

        batch_size = org_size[0]
        x = x.view((batch_size, -1))

        mean, log_var = model_vae.encode(x)
        # print("mean:", mean.is_cuda)
        # mean = mean.cuda()

        classifier_x = mean
        # print("classifier_x:", classifier_x.is_cuda)
        # classifier_x = classifier_x.cuda()  # cuda

        pred_y = self.classifier(classifier_x)
        return pred_y

# 3 设置模型
model_vae = VAE(input_dim, latent_dim, output_dim)  # VAE模型
# print(model)
model_classifier = Classifier(input_dim, latent_dim, output_dim, classifier_dim)

# 测试模型
# input = torch.rand((1, 1, 784))
# output = model_vae_vae_vae(input)
# print(output)

# 4 设置损失函数
recon_loss = lambda recon_x, x: F.binary_cross_entropy(recon_x, x)
kl_loss = lambda mean, log_var: -5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())
classifier_loss = lambda pred_y, y: F.cross_entropy(pred_y, y)

# 5 设置优化器
learning_rate = 5e-3
optimizer = optim.Adam(model_vae.parameters(), lr=learning_rate)

# 6 设置训练网络的一些参数
epochs = 100
total_train_step = 0
total_test_step = 0

# 记录train accuracy and test accuracy
loss_vae_train, loss_vae_test = [], []

# 7 开始训练
## 7.1 训练VAE
weight_vae_dir = "weights/vae/"
os.makedirs(weight_vae_dir, exist_ok=True)   # 权重
# model_vae.to(device)
model_vae.cuda()

for epoch in range(epochs):
    print("--------epoch: {}------".format(epoch+1))
    model_vae.train()
    for data in train_data_loader:
        optimizer.zero_grad()   # 计算梯度之前,把上一次的梯度清零
        imgs, label = data
        imgs = imgs.cuda()
        # label = label.cuda()
        # print(imgs.shape)
        _, recon_data, mean, log_var = model_vae(imgs)
        loss1 = recon_loss(recon_data, imgs)
        loss2 = kl_loss(mean, log_var)

        loss = loss1 + loss2    # VAE相关损失
        total_train_step += 1
        if total_train_step % 40 == 0:
            loss_vae_train.append(loss)

        loss.backward()

        optimizer.step()
    print(f"train loss:\t{loss:.4f}")

    model_vae.eval()
    with torch.no_grad():
        for data in test_data_loader:
            imgs, label = data
            imgs = imgs.cuda()
            _, recon_data, mean, log_var = model_vae(imgs)
            loss1 = recon_loss(recon_data, imgs)
            loss2 = kl_loss(mean, log_var)

            loss = loss1 + loss2
            total_test_step += 1
            if total_test_step % 20 == 0:
                loss_vae_test.append(loss)

        if epoch%20 == 0:
            weight_save_path = os.path.join(weight_vae_dir, f"{epoch}.pth")
            torch.save(model_vae, weight_save_path)
        # print("VAE testing loss: {}".format(loss))
        print(f"test loss:\t{loss:.4f}")

loss_vae_train = [loss_i.item() for loss_i in loss_vae_train]
loss_vae_test = [loss_i.item() for loss_i in loss_vae_test]
print(loss_vae_train[-1], loss_vae_test[-1])

plt.plot(loss_vae_train, color="red", linestyle="-", label="train")
plt.plot(loss_vae_test, color="blue", linestyle="--", label="test")
plt.title("VAE Test and Train Loss")
plt.xlabel("times")
plt.ylabel("loss")
plt.legend(loc="upper right")
plt.show()

# # 模型加载
# model_path = r"path/to/weights/vae/80.pth"
# model_vae_80 = torch.load(model_path)
# model_vae_80.to(device)
# model_vae_80.eval()
# print(next(model_vae_80.parameters()).device)
# model_vae = model_vae_80


# 分类器训练设置
# 1 创建并加载数据集 transforms.Resize((1,784)),
data_transform = transforms.Compose([
    transforms.Resize(RESIZE_SHAPE),         # 把图片resize为256*256
    # transforms.CenterCrop(224),     # 随机裁剪224*224
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266])   # 标准化
    # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
# test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)
batch_size = 64
num_workers = 8

all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform) 
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, shuffle=True, num_workers = num_workers)
classifier_loss = lambda pred_y, y: F.cross_entropy(pred_y, y)
## 7.2 训练分类器

# 设置优化器(分类器)

learning_rate = 5e-3
optimizer = optim.Adam(model_classifier.parameters(), lr=learning_rate)

# 6 设置训练网络(分类器)的一些参数1
epochs = 300
total_train_step = 0
total_test_step = 0

loss_classifier_train, loss_classifier_test = [], []
train_acc, test_acc = [], []

# 权重保存位置
weight_classifier_dir = "weights/classifier/"
os.makedirs(weight_classifier_dir, exist_ok=True)


# model_vae = model_vae.cuda()
model_vae.eval()    # VAE模型为测试模式!!
for epoch in range(epochs):
    print("--------epoch: {}------".format(epoch+1))
    # 分类器训练模式
    model_classifier.train()
    for data in train_data_loader:

        optimizer.zero_grad()   # 计算梯度之前,把上一次的梯度清零
        imgs, label = data
        imgs = imgs.cuda()  # cuda
        label = label.cuda()
        # print(imgs.shape)

        pred_y = model_classifier(imgs, model_vae)
        loss = classifier_loss(pred_y, label)

        total_train_step += 1
        if total_train_step % 120 == 0:
            loss_classifier_train.append(loss)
            pred_y = pred_y.cpu().detach().numpy()
            label = label.cpu().detach().numpy()
            train_acc.append(accuracy_score(label, pred_y.argmax(1)))
        loss.backward()

        optimizer.step()
    # print("Classifier train loss: {}".format(loss))
    # print("Classifier train accuracy: {}".format(train_acc[-1]))
    print(f"train loss:\t{loss:.4f}, train_accuary:\t{train_acc[-1]:.4f}")

    # 分类器测试模式
    model_classifier.eval()
    with torch.no_grad():
        for data in test_data_loader:

            imgs, label = data
            imgs = imgs.cuda()  # cuda
            label = label.cuda()
            
            pred_y = model_classifier(imgs, model_vae)
            loss = classifier_loss(pred_y, label)
            total_test_step += 1
            if total_test_step % 60 == 0:
                loss_classifier_test.append(loss)
                pred_y = pred_y.cpu().detach().numpy()
                label = label.cpu().detach().numpy()
                test_acc.append(accuracy_score(label, pred_y.argmax(1)))
        if epoch%20 == 0:
            weight_save_path = os.path.join(weight_classifier_dir, f"{epoch}.pth")
            torch.save(model_classifier, weight_save_path)
        # print("Classifier testing loss: {}".format(loss))
        # print("Classifier train accuracy: {}".format(test_acc[-1]))
        print(f"test loss:\t{loss:.4f}, test_accuary:\t{test_acc[-1]:.4f}")

# loss_classifier_train = [loss_i.item() for loss_i in loss_classifier_train]
# loss_classifier_test = [loss_i.item() for loss_i in loss_classifier_test]
print(loss_classifier_train[-1], loss_classifier_test[-1])
plt.plot(loss_classifier_train, color="red", linestyle="-", label="train")
plt.plot(loss_classifier_test, color="blue", linestyle="--", label="test")
plt.title("Test and Train Loss")
plt.xlabel("times")
plt.ylabel("loss")
plt.legend(loc="upper right")
plt.show()

print(train_acc[-1], test_acc[-1])
plt.plot(train_acc, color="red", linestyle="-", label="train")
plt.plot(test_acc, color="blue", linestyle="--", label="test")
plt.title("Test and Train Accuracy")
plt.xlabel("times")
plt.ylabel("accuracy")
plt.legend(loc="upper right")
plt.show()

3.2 结果 

分开训练的VAE 损失变化

 分开训练的分类器损失变化

3.3 踩过的坑

选用MNIST数据集,分开训练,VAE损失在0.2左右没怎么降,分类器如果采用一次性训练中的两层网络损失比较难下降,分类器网络增加了两层线性层后,效果较好,准确率在80%~90%左右。

这个问题困扰了我很久,我就觉得奇怪,无论我怎么调batch_size和learning_rate,loss就是不下降,learning_rate都调到1e-2了还是不下降,只能说另有原因。后面在分类器加了两层之后,效果一下就好起来了!

可能分类器不能太简单,否则比较难训练分类器!!

4. 2023年08月13日【更新】

奇怪的是,上面同一个代码,第二天早上起来运行就不是原来那样了,损失还是下降不了,跟分类器部分线性层的层数没有关系了。

后面对比了最最最初始的代码(以下称为模板代码),开始寻找是不是数据加载的问题。

4.1 ImageFolder式的数据加载方式有没有问题?没有

最开始的加载MNIST数据集是这样加载的

# 1 创建并加载数据集 transforms.Resize((1,784)),
transforms = transforms.Compose([transforms.ToTensor()])
train_data = MNIST(r'mnist/', train=True, download=True, transform=transforms)
test_data = MNIST(r'mnist/', train=False, download=True, transform=transforms)

train_data_loader = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=0)
test_data_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0)

它前5个epoch也是损失比较难下降,准确率一直在0.2甚至0.1以下徘徊,但是自从第6个epoch开始突然准确率开始上升到0.3以上,此后就逐步正常上升,12个epoch左右准确率就能上升到0.85左右了。

为了通用性,我使用了普遍的加载方式

from dataset import MyDataDivide
RESIZE_SHAPE = (28, 28)
batch_size = 64
num_workers = 0
img_dir = r"path/to/MNIST_pics/train_images"
data_transform = transforms.Compose([
    transforms.Resize(RESIZE_SHAPE),         # 把图片resize为256*256
    # transforms.CenterCrop(224),     # 随机裁剪224*224
    transforms.ToTensor(),
    # transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266])   # 标准化
    # transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
all_dataset = datasets.ImageFolder(root=img_dir, transform=data_transform) 
train_data_loader, validate_data_loader, test_data_loader = MyDataDivide(all_dataset, batch_size = batch_size, shuffle=True, num_workers = num_workers)

这次是在第10个epoch开始发生转折,此后逐步正常上升。

但之前确实是很奇怪,我还试了cifar100、flowers数据集,特别还先把它们无论是原大小还是resize到与MNIST相同的(28, 28),都是在经过了100个epoch,损失基本不下降,准确率也是在0.2以下。

不知道是不是相比于MNIST,cifar100这样的由于背景多样使得分类更加困难。

分类层加了两层之后还是这个问题……

4.2 num_workers:影响不大

模仿模板代码中num_workers=0,设置num_workers并不会对总体准确率变化趋势有影响,可能会加快转折的速度。比如我设置了num_workers=8后,刚才的这个原本在第10个epoch发生转折的,现在提前到了第7个。

这也说明我这样的数据加载应该是没有问题的。

4.3 数据加载时的标准化:标准化可能是负面影响

我原本考虑是不是这里的标准化对数据训练有影响,但结果是我加了标准化,损失往往反而变得很大,成千上万的那种。所以后面还是不加标准化。

transforms.Normalize(mean=[0.485, 0.486, 0.446], std=[0.248, 0.245, 0.266])   # 标准化
# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

5. VAE接入ResNet【2023年08月17日更新】

5.1 ResNet分类

        众所周知,ResNet的分类性能实在是强。前面屡屡难以突破时,应要求要先保证分类准确率,就找了个通用的分类模型来分类,我选的就是ResNet。实验用的是跟前面的花类数据集差不多的一个数据集,数据量大概是10000张,图片大小是(256, 256),也是分5类。

        实验结果我大为震撼,batch_size和learning_rate什么的超参数选个默认的正常值就行,没有什么调的,准确率上升得很快,我记得三四十个epoch,准确率就已经到99%左右了,后面到100个epoch左右的,准确率能到99.59%(2207个测试图片就判对了2197!),2000多张图片里面只有10个没有分对!

        我知道ResNet分类性能高,但没有想到能高到这种地步,这分类还什么进步的空间吗?!

        然后所以我就不明白,到底最终是要个什么要求。分类效果已经可以说到天花板了,难道是希望降维后分类会再提升吗?是要达到测试集100%的分类准确率吗?

5.2 分开训练的分类器换成ResNet

        其实当时调了几天参数准确率总是上不去,以及后来分类器新加了两层后效果明显提升后,我就想直接把分类器换成ResNet了。但是当时没想明白怎么用,因为ResNet的输入是图片,VAE的编码器(降维后)的输出又是一个的形式,out_dim就是要降到的维度数量。

        所以,那就考虑把编码器的输出reshape成适合ResNet的输入就好了。给它

(batch_size, out_dim)→(batch_size, C, H, W)

这里的C也就是通道数,还是定成3比较好,模仿图像的3通道。剩下的H = W = sqrt(out_dim/3)就好了。

        另外,图像的像素值全为正,tensor形式的又得在[0, 1]之间,我就考虑就sigmoid函数把降维后的数据映射到[0, 1],但映射前要注意降维后的数据分布,像我的降维后的数据特别小1e-8这种量级,直接用sigmoid会全变成0.5000左右,没有区别,建议进行变换再映射到[0, 1]内。

         结果非常地不行。表现为训练损失保持在1.6左右不下降,测试损失特别大,大到1e30这样的,准确率自然也就没得看了。我认为是ResNet分类性能太好了,而且我一开始选的降到的维数是48,,只能说极其严重的过拟合。

        然后我把降到的维数改成768,也就是相当于(3, 16, 16)这样的小图片,训练损失虽然变化不在,但是测试损失到了1e15这个量级,也算是好不少了。应该是可以说明ResNet对小图片这样的,维数少的因为太好分类所以严重过拟合。降到的维度还能再大吗,不能了,受显存限制。另外降到的维度大也就要没有什么降维的意思了。

6. VAE的编码器和解码器换为ResNet结构【2023年08月23日】

程序是跑起来了,但是暂时还没找到原因,一直以来存在的问题是,损失不下降。这个模型它就没有学到东西。

现在准确率到什么地步,20%,本来就是在分5类,等于说这个模型它在猜……


总结

自从2023年08月04日搞了好多天,走了好多弯路,昨天晚上在分类器加两层线性层后才算有所突破,特此记录一下。

  • 14
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值