类小波自编码器复现

# 基本思想

思想来源:

> https://arxiv.org/abs/1712.07493

通过自编码器得到高频与低频,然后将低频放入标准分类网络得到特征,高频放入类似的网络通过轻量级网络与低频融合得到特征

![大致轮廓](https://img-blog.csdnimg.cn/cc0c3486c2b54a1598e9d9903961e2cf.png)


# 基本结构
## 导入包

```bash
import torch
import torch.nn as nn
import torchvision.models as models
import time
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR, CyclicLR
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt
import torchvision
from torchvision import datasets, models, transforms
import torch.nn.functional as F

from torch.nn.utils import prune
from torch.optim import SGD


放入GPU

# torch.hub._validate_not_a_forked_repo("chenyaofo", "pytorch-cifar-models", "master")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

if torch.cuda.is_available():
    print("yes")
else:
    print("no")
# print(torch.cuda.is_available())

数据导入

kaggle intel数据集

Intel数据集(Intel Image Classification)是一个由Intel公司发布的图像分类数据集。该数据集包含大约25,000张彩色图片,分为6个不同类别:建筑、森林、冰川、山脉、海洋和街道。这些图像用于训练和评估计算机视觉和机器学习算法在自然景观和场景分类任务中的性能。

数据集分为三个子集:训练集、验证集和测试集。训练集包含14,034张图片,用于训练模型。验证集包含3,000张图片,用于在训练过程中评估模型性能。测试集包含7,500张图片,用于在模型训练完成后评估模型的泛化能力。

Intel数据集的图片尺寸为150x150像素

transform_test = transforms.Compose(
        [transforms.Resize((128, 128)),
         transforms.ToTensor()])
    
transform_train = transforms.Compose([
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(15),
        transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
        transforms.Resize((128, 128)),
        transforms.ToTensor()
    ])
data_dir = '/kaggle/input/intel-image-classification'

# 设置批次大小
batch_size = 64

# 加载训练集数据
trainset = ImageFolder(root='/kaggle/input/intel-image-classification/seg_train/seg_train', transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)

# 加载测试集数据
testset = ImageFolder(root='/kaggle/input/intel-image-classification/seg_test/seg_test', transform=transform_test)
test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

cifar-100

CIFAR-100 图像来自于真实世界中的各种场景,如自然景观、动物、食品、交通工具等,包含了100个类别的60000个32x32彩色图像。每个类别包含了600个图像,其中500个用于训练,100个用于测试

dataset = CIFAR100(root = 'data/', download = True, transform = ToTensor())
test_dataset = CIFAR100(root = 'data/', train = False, transform = ToTensor())
dataset = CIFAR100(root = 'data/', download = True, transform = ToTensor())

test_dataset = CIFAR100(root = 'data/', train = False, transform = ToTensor())
transform_test = transforms.Compose(
    [transforms.Resize((128, 128)),
     transforms.ToTensor()])

transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])
trainset = torchvision.datasets.CIFAR100(root='./data', train=True,
                                         download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=64,
                         shuffle=True, num_workers=2)

test_dataset = torchvision.datasets.CIFAR100(root='./data', train=False,
                                             download=True, transform=transform_test)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64,
                                          shuffle=False, num_workers=2)

自编码器

class base_encoder(nn.Module):
    def __init__(self):
        super(base_encoder, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.convsplit1 = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=2, stride=2, padding=0)
        self.convsplit2 = nn.Conv2d(in_channels=64, out_channels=3, kernel_size=2, stride=2, padding=0)
        self.bn1 = nn.BatchNorm2d(3)
        self.bn2 = nn.BatchNorm2d(3)

    def forward(self, x):
        x = nn.ReLU()(self.conv1(x))
        x = nn.ReLU()(self.conv2(x))
        x = nn.ReLU()(self.conv3(x))
        x1 = nn.Sigmoid()(self.bn1(self.convsplit1(x)))
        x2 = nn.Sigmoid()(self.bn2(self.convsplit2(x)))
        return (x1, x2)


class dec(nn.Module):
    def __init__(self):
        super(dec, self).__init__()
        self.decoder1 = nn.Sequential(
            nn.ConvTranspose2d(in_channels=3, out_channels=64, kernel_size=2, stride=2, padding=0),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=3, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(3))
        self.decoder2 = nn.Sequential(
            nn.ConvTranspose2d(in_channels=3, out_channels=64, kernel_size=2, stride=2, padding=0),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=3, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(3))

    def forward(self, x):
        out1 = self.decoder1(x[0])
        out2 = self.decoder2(x[1])
        return nn.Sigmoid()(out1 + out2)

自编码器损失函数

def wavelet_loss(x_true, x, x0):
    l2 = torch.norm(x0, dim=1)
    l2 = torch.mean(l2)
    l1 = mse_loss(x_true, x)
    return (l1, l2)
mse_loss = nn.MSELoss()

resnet-50网络

利用预训练res50模型

class FusionNetwork(nn.Module):
    def __init__(self, num_classes):
        super(FusionNetwork, self).__init__()

        # Standard network (ResNet50)
        resnet50 =  models.resnet50(pretrained=False)
        model_path = '/kaggle/input/pretrained-pytorch/resnet50-19c8e357.pth'
        resnet50.load_state_dict(torch.load(model_path))
        # Modify the first convolution layer to accept 6-channel input
        resnet50.conv1 = nn.Conv2d(6, 64, kernel_size=3, stride=1, padding=1, bias=False)
        num_ftrs = resnet50.fc.in_features
        resnet50.fc = nn.Linear(num_ftrs, num_classes)
        resnet50 = resnet50.to(device)
        self.standard_network = resnet50

    def forward(self, xL, xH):
        # Concatenate xL and xH along the channel dimension
        x = torch.cat((xL, xH), dim=1)
        # Pass concatenated feature map through ResNet classifier
        x = self.standard_network(x)

        return x

利用自定义res50模型

自定义 res50网络的1/4

之所以所有conv除四是因为论文中要求高频部分通过类网络得到特征

class BasicBlock(nn.Module):
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out


class ResNet50_modified(nn.Module):
    def __init__(self, num_classes):
        super(ResNet50, self).__init__()

        self.in_channels = 16
        self.conv1 = nn.Conv2d(3, 16, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self.make_layer(16, 3)
        self.layer2 = self.make_layer(32, 4, stride=2)
        self.layer3 = self.make_layer(64, 6, stride=2)
        self.layer4 = self.make_layer(128, 3, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(128 * 4, num_classes)

    def make_layer(self, out_channels, num_blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * 4:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * 4, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * 4),
            )
        layers = []
        layers.append(BasicBlock(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * 4
        for i in range(1, num_blocks):
            layers.append(BasicBlock(self.in_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

输入的维数变化

计算公式

output_size = (input_size - kernel_size + 2 * padding) / stride + 1
输入尺寸为16x16时,使用kernel_size=7的conv1:

conv1: 输入为(3, 16, 16),输出为(16, 5, 5),因为(16-7+2*3)/2+1 = 5
bn1: (16, 5, 5)
relu: (16, 5, 5)
maxpool: (16, 2, 2)
layer1: (16, 2, 2) -> (64, 2, 2)
layer2: (64, 2, 2) -> (128, 1, 1)
layer3: (128, 1, 1) -> (256, 1, 1)
layer4: (256, 1, 1) -> (512, 1, 1)
avgpool: (512, 1, 1)
fc: (num_classes,)
使用kernel_size=3的conv1:
padding=1,以保持相同的输出尺寸
conv1: 输入为(3, 16, 16),输出为(16, 8, 8),因为(16-3+2*1)/2+1 = 8
bn1: (16, 8, 8)
relu: (16, 8, 8)
maxpool: (16, 4, 4)
layer1: (16, 4, 4) -> (64, 4, 4)
layer2: (64, 4, 4) -> (128, 2, 2)
layer3: (128, 2, 2) -> (256, 1, 1)
layer4: (256, 1, 1) -> (512, 1, 1)
avgpool: (512, 1, 1)
fc: (num_classes,)

使用kernel_size=5的conv1:
padding=2,以保持相同的输出尺寸
conv1: 输入为(3, 16, 16),输出为(16, 6, 6),因为(16-5+2*2)/2+1 = 6
bn1: (16, 6, 6)
relu: (16, 6, 6)
maxpool: (16, 3, 3)
layer1: (16, 3, 3) -> (64, 3, 3)
layer2: (64, 3, 3) -> (128, 2, 2)
layer3: (128, 2, 2) -> (256, 1, 1)
layer4: (256, 1, 1) -> (512, 1, 1)
avgpool: (512, 1, 1)
fc: (num_classes,)

自定义res50网络

网络结构

class BasicBlock(nn.Module):
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out


class ResNet50(nn.Module):
    def __init__(self, num_classes):
        super(ResNet50, self).__init__()

        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self.make_layer(64, 3)
        self.layer2 = self.make_layer(128, 4, stride=2)
        self.layer3 = self.make_layer(256, 6, stride=2)
        self.layer4 = self.make_layer(512, 3, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * 4, num_classes)

    def make_layer(self, out_channels, num_blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * 4:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * 4, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * 4),
            )
        layers = []
        layers.append(BasicBlock(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * 4
        for i in range(1, num_blocks):
            layers.append(BasicBlock(self.in_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

输入分类器的图像,每步维数变化

输入尺寸为64x64时:
conv1: 输入为(3, 64, 64),输出为(64, 32, 32)
bn1: (64, 32, 32)
relu: (64, 32, 32)
maxpool: (64, 16, 16)
layer1: (64, 16, 16) -> (256, 16, 16)
layer2: (256, 16, 16) -> (512, 8, 8)
layer3: (512, 8, 8) -> (1024, 4, 4)
layer4: (1024, 4, 4) -> (2048, 2, 2)
avgpool: (2048, 1, 1)
fc: (num_classes,)
输入尺寸为128x128时:
conv1: 输入为(3, 128, 128),输出为(64, 64, 64)
bn1: (64, 64, 64)
relu: (64, 64, 64)
maxpool: (64, 32, 32)
layer1: (64, 32, 32) -> (256, 32, 32)
layer2: (256, 32, 32) -> (512, 16, 16)
layer3: (512, 16, 16) -> (1024, 8, 8)
layer4: (1024, 8, 8) -> (2048, 4, 4)
avgpool: (2048, 1, 1)
fc: (num_classes,)
输入尺寸为32x32时:
conv1: 输入为(3, 32, 32),输出为(64, 16, 16)
bn1: (64, 16, 16)
relu: (64, 16, 16)
maxpool: (64, 8, 8)
layer1: (64, 8, 8) -> (256, 8, 8)
layer2: (256, 8, 8) -> (512, 4, 4)
layer3: (512, 4, 4) -> (1024, 2, 2)
layer4: (1024, 2, 2) -> (2048, 1, 1)
avgpool: (2048, 1, 1)
fc: (num_classes,)
conv1: 输入为(3, 16, 16),输出为(64, 8, 8)
bn1: (64, 8, 8)
relu: (64, 8, 8)
maxpool: (64, 4, 4)
layer1: (64, 4, 4) -> (256, 4, 4)
layer2: (256, 4, 4) -> (512, 2, 2)
layer3: (512, 2, 2) -> (1024, 1, 1)
layer4: (1024, 1, 1) -> (2048, 1, 1)
avgpool: (2048, 1, 1)
fc: (num_classes,)
自定义res50网络(输入16*16)
class ModifiedResNet50(nn.Module):
    def __init__(self, num_classes):
        super(ModifiedResNet50, self).__init__()

        self.in_channels = 32
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.relu = nn.ReLU(inplace=True)

        self.layer1 = self.make_layer(32, 2)
        self.layer2 = self.make_layer(64, 2, stride=2)
        self.layer3 = self.make_layer(128, 2, stride=2)
        self.layer4 = self.make_layer(256, 2, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(256 * 4, num_classes)

    def make_layer(self, out_channels, num_blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * 4:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * 4, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * 4),
            )
        layers = []
        layers.append(BasicBlock(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * 4
        for i in range(1, num_blocks):
            layers.append(BasicBlock(self.in_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
       

输入分类器的图像为3 * 16 *16,每步维数变化

conv1:

输入尺寸:(16, 16, 3)
输出尺寸:(16, 16, 32)
bn1 和 ReLU: 输出尺寸保持不变:(16, 16, 32)

layer1:

输入尺寸:(16, 16, 32)
输出尺寸:(16, 16, 128)
layer2:

输入尺寸:(16, 16, 128)
输出尺寸:(8, 8, 256) (因为有一个stride=2的卷积)
layer3:

输入尺寸:(8, 8, 256)
输出尺寸:(4, 4, 512) (因为有一个stride=2的卷积)
layer4:

输入尺寸:(4, 4, 512)
输出尺寸:(2, 2, 1024) (因为有一个stride=2的卷积)
avgpool:

输入尺寸:(2, 2, 1024)
输出尺寸:(1, 1, 1024)

resnet-9网络

一般resnet-9网络

def conv_block(in_channels, out_channels, pool=False):
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
              nn.BatchNorm2d(out_channels),
              nn.ReLU(inplace=True)]
    if pool: layers.append(nn.MaxPool2d(2))
    return nn.Sequential(*layers)


class ResNet9(nn.Module):
    def __init__(self, in_channels, num_classes):
        super().__init__()

        self.conv1 = conv_block(in_channels, 64)
        self.conv2 = conv_block(64, 128, pool=True)
        self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))

        self.conv3 = conv_block(128, 256, pool=True)
        self.conv4 = conv_block(256, 512, pool=True)
        self.res2 = nn.Sequential(conv_block(512, 512), conv_block(512, 512))

        self.classifier = nn.Sequential(nn.MaxPool2d(4),
                                        nn.Flatten(),
                                        nn.Linear(512, num_classes))

    def forward(self, xb):
        out = self.conv1(xb)
        out = self.conv2(out)
        out = self.res1(out) + out
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.res2(out) + out
        out = self.classifier(out)
        return out

一般resnet9网络 1/4

def conv_block(in_channels, out_channels, pool=False):
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
              nn.BatchNorm2d(out_channels),
              nn.ReLU(inplace=True)]
    if pool: layers.append(nn.MaxPool2d(2))
    return nn.Sequential(*layers)

class ResNet9(nn.Module):
    def __init__(self, in_channels, num_classes):
        super().__init__()

        self.conv1 = conv_block(in_channels, 16)         # 64 -> 16
        self.conv2 = conv_block(16, 32, pool=True)       # 128 -> 32
        self.res1 = nn.Sequential(conv_block(32, 32), conv_block(32, 32))

        self.conv3 = conv_block(32, 64, pool=True)       # 256 -> 64
        self.conv4 = conv_block(64, 128, pool=True)      # 512 -> 128
        self.res2 = nn.Sequential(conv_block(128, 128), conv_block(128, 128))

        self.classifier = nn.Sequential(nn.MaxPool2d(4),
                                        nn.Flatten(),
                                        nn.Linear(128, num_classes))  # 512 -> 128

    def forward(self, xb):
        out = self.conv1(xb)
        out = self.conv2(out)
        out = self.res1(out) + out
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.res2(out) + out
        out = self.classifier(out)
        return out

融合网络

简单融合

加权融合

复杂融合

训练

融合网络1

 def train_classifier(classifier, trainloader, test_loader, num_epochs=200):
        classifier.to(device)

        criterion = nn.CrossEntropyLoss()

        learning_rate = 0.1
        momentum = 0.9
        optimizer = SGD(classifier.parameters(), lr=learning_rate, momentum=momentum)
        base_lr = 0.0001
        max_lr = 0.1
        step_size_up = 200
        scheduler = CyclicLR(optimizer, base_lr=base_lr, max_lr=max_lr, step_size_up=step_size_up)

        for epoch in range(num_epochs):
            start_time_epoch = time.time() 
            classifier.train()
            running_loss = 0.0
            for i, (inputs, labels) in enumerate(trainloader):
                inputs, labels = inputs.to(device), labels.to(device)

                # Encode and decode the inputs
                enc_output = encoder(inputs)
                outputs = decoder(enc_output)

                # Get the low-frequency and high-frequency channels
                xL, xH = enc_output

                # Forward pass through the classifier
                sL = classifier(xL, xH)

                # Calculate loss
                loss = criterion(sL, labels)

                # Update the classifier
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                running_loss += loss.item()
            end_time_epoch = time.time()  # Add this line to record the end time of the epoch
            epoch_time = end_time_epoch - start_time_epoch
            print(f"Epoch {epoch + 1}/{num_epochs}, Time: {epoch_time:.2f} seconds, Loss: {running_loss / len(trainloader)}")
            evaluate_model(classifier, test_loader)
            # Print the average loss for this epoch
            # Update the learning rate
            scheduler.step()
        return classifier

    # Call the train_classifier function after your main() function
    trained_classifier = train_classifier(classifier, trainloader, test_loader)

结果分析

cifar数据集

resnet9网络

输入224*224

分类网络输入112*112

输入128*128

分类器输入64 *64

#简单融合
Epoch 24/200, Loss: 0.4646315973852297
Accuracy of the model on the test images : Top-1 63.95%, Top-5 87.67%
输入64*64

分类网络输入32*32

输入32*32

分类网络输入16*16

#简单融合
Epoch 28/200, Loss: 1.0412922296530145
Accuracy of the model on the test images : Top-1 60.39%, Top-5 86.07%

resnet50网络

输入224*224

分类网络输入112*112

输入128*128
直接输入
前十个epoch 平均需要59.83秒

分类网络输入64*64


简单融合
Epoch 16, Loss: 0.5023146427490495
Accuracy of the model on the train images : Top-1 82.08%, Top-5 99.86%
Accuracy of the model on the test images : Top-1 80.20%, Top-5 99.87%


Epoch 17, Loss: 0.49591231617060577
Accuracy of the model on the train images : Top-1 82.14%, Top-5 99.91%
Accuracy of the model on the test images : Top-1 79.97%, Top-5 99.87%

输入64*64

分类网络输入32*32

#简单融合
Accuracy of the model on the test images : Top-1 38.31%, Top-5 67.64%
输入32*32

分类网络输入16*16

#预训练:
Epoch 28/200, Loss: 2.571069859482748
Accuracy of the model on the test images : Top-1 33.80%, Top-5 63.27%
#自定义
Epoch 52/200, Time: 28.56 seconds, Loss: 2.742427248021831
Accuracy of the model on the test images : Top-1 28.52%, Top-5 57.98%

计算模型复杂度

计算模型的参数数量:

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

encoder_params = count_parameters(encoder)
decoder_params = count_parameters(decoder)
classifier_params = count_parameters(classifier)
total_params = encoder_params + decoder_params + classifier_params

计算复杂度

计算复杂度通常使用浮点操作数(FLOPs)来衡量,它表示执行一个前向传播所需的浮点运算数量。在计算复杂度时,我们需要考虑网络中的卷积层、全连接层等的计算量

安装 thop

pip install thop
from thop import profile

def calculate_flops(model, input_size):
    input = torch.randn(1, *input_size).to(device)
    flops, _ = profile(model, inputs=(input,))
    return flops

encoder_flops = calculate_flops(encoder, (3, 32, 32))
decoder_flops = calculate_flops(decoder, (64, 8, 8)) # 假设编码器输出尺寸为 64x8x8
classifier_flops = calculate_flops(classifier, (64, 8, 8))
total_flops = encoder_flops + decoder_flops + classifier_flops

训练和推理时间

在训练过程中,使用Python的 time 模块来计算每个epoch的时间,从而了解训练速度。在推理阶段,可以计算单次前向传播所需的时间,以评估模型的推理速度。在代码中,已经计算了评估模型所需的时间,可以通过 end_time_eval - start_time_eval 获得

  • 训练时间:
import time
start_time_train = time.time()

# Training loop here

end_time_train = time.time()
train_time = end_time_train - start_time_train
print(f"Training time: {train_time:.2f} seconds")

-推理时间:
在推理阶段,我们可以计算单次前向传播所需的时间。首先,确保模型处于评估模式,然后使用time库记录推理开始和结束时间。计算推理时间时,通常对多个输入进行平均以获得更稳定的结果。

def inference_time(model, input_size, num_samples=100):
    model.eval()
    input = torch.randn(num_samples, *input_size).to(device)
    start_time_inference = time.time()
    with torch.no_grad():
        for i in range(num_samples):
            _ = model(input[i].unsqueeze(0))
    end_time_inference = time.time()
    return (end_time_inference - start_time_inference) / num_samples

encoder_inference_time = inference_time(encoder, (3, 32, 32))
decoder_inference_time = inference_time(decoder, (64, 8, 8)) # 假设编码器输出尺寸为 64x8x8
classifier_inference_time = inference_time(classifier, (64, 8, 8))
total_inference_time = encoder_inference_time + decoder_inference_time + classifier_inference_time

print(f"Encoder inference time: {encoder_inference_time:.6f} seconds")
print(f"Decoder inference time: {decoder_inference_time:.6f} seconds")
print(f"Classifier inference time: {classifier_inference_time:.6f} seconds")
print(f"Total inference time: {total_inference_time:.6f} seconds")

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值