【小实验1】比较ResNet、ViT、SwinTransformer的归纳偏置(然而并没有达到预期结果)

写在前面:本实验并未获得预期的结果,更多的是当作实验记录。

1. idea

1.1 实验思路

这个实验的思路是这样的:通过随机初始化(正态分布)的未经过训练的ResNet、ViT和SwinTransformer,来对ImangeNet-1k(2012)的验证集(val,共50000张图片,1000类别)进行预测,对比预测结果和随机猜(准确率为1‰)的区别。

1.2 灵感来源

灵感来自Deep Clustering for Unsupervised Learning of Visual Features这篇论文,论文中提到这么一段话:
在这里插入图片描述
这段话说的是:“当模型(如CNN)中的参数 θ \theta θ使用高斯分布随机初始化时,在未经训练时,预测出的结果很差。但是,要比随即猜测(对于ImageNet-1k来讲随机猜对的概率为千分之1)要好不少。”

作者给出的解释是,这是因为我们模型,如CNN,引入了先验知识(即CNN关于平移等变性、局部性等归纳偏置),所以随机初始化后,虽然没有经过数据训练,但得出的结果也要比随机猜要好。

所以我就在想,用未经训练的随机初始化的ResNet、ViT和SwinTransformer来预测ImageNet-1k(val),准确率越高,从某种程度上来讲说明引入的归纳偏置越强。

2. 实验设置

实验环境Pytorch1.12、2xRTX3090(24G)
模型ResNet、ViT、SwinTransformer
测试数据集ImageNet-1k(2012,val,50000张,1000类别)
实验模型参数量
ResNet5025.56M
ResNet10144.55M
ViT_B_1686.57M
ViT_L_16304.33M
swin_t28.29M
swin_s49.61M

其中初始化基本都是采用各种类型的正态分布进行初始化,比如

nn.init.trunc_normal_
nn.init.normal_
nn.init.kaiming_normal_

具体用法参考Pytorch官方文档

3. 实验结果

3.1 结果

实验结果是:未经训练的、随机初始化的ResNet、ViT和SwinTransformer,最后预测的结果都和随机猜的差不多(千分之一)。

模型预测正确数量测试集数量准确率训练时间batch_size显卡使用情况
resnet5050500001‰35.485s256在这里插入图片描述
resnet10150500001‰45.556s256在这里插入图片描述
vit_b_1650500001‰83.587s256在这里插入图片描述
vit_l_1650500001‰370.091s64在这里插入图片描述
swin_t4350000~1‰44.616s256在这里插入图片描述
swin_s4950000~1‰67.754s256在这里插入图片描述

3.2 结果分析

3.2.1 一个奇怪的现象

有个比较有意思的现象,我们可以看到对于ResNet系列和ViT系列,都是预测对了50个样本。这是因为对于未经训练的随机初始化的ResNet和ViT,不论输入什么样本,预测出的标签是一样的。

注意:不是说输出都是一样的,而是说最后经过softmax输出的1000维特征,最大值对应的索引都相同。 所以才会有正好50个预测对的。(Imagnet-1K的val有50000张图片,1000个类别,每个类别都有50张图片)。

而对于SwinTransformer,最后预测的标签确实看上去是随机分布的,但也跟瞎猜的一样(都是1‰),并没有像Deep Cluster的作者说的那样:要比瞎猜的好上不少。

3.2.2 分析

我觉得Deep Cluster的作者说的应该是没错的(毕竟是大佬写的文章,还是顶会),问题应该出在我的代码实现的细节上,比如初始化的方式?我是使用TorchVision源码中自带的初始化,感觉也没什么问题啊。

这个现象我不打算再深入了,因为本身就只是想读一下Deep Cluster,看到作者说的这个现象觉得很有趣就像试一试,但没出现预期的效果。如果想深入找一下原因的话可以看看Deep Cluster的引用文献。

4. 代码

import os
import time

import torch
import torch.nn as nn
from torchvision.models import resnet50, resnet101, vit_b_16, vit_l_16, swin_t, swin_s
from torchvision import datasets, transforms
from tqdm import tqdm


def get_dataloader(data_dir=None, batch_size=64):
    '''
    :param data_dir: val dataset direction
    :return: imageNet
    '''
    assert data_dir is not None

    transform = transforms.Compose([
        transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
        transforms.ToTensor(),
        # 这里我在考虑是否要进行标准化,可以做个对比实验
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])

    val_dataset = datasets.ImageFolder(
        data_dir,
        transform,
    )
    val_loader = torch.utils.data.DataLoader(
        val_dataset,
        batch_size=batch_size,
        num_workers=8,
        shuffle=True,
    )

    return val_loader


def get_models():
    '''
    :return: res50, res101, ViT, SwinTransformer
    '''
    # load model with random initialization
    res50 = resnet50()
    res101 = resnet101()
    vit_B_16 = vit_b_16()
    vit_L_16 = vit_l_16()
    swin_T = swin_t()  # 参数量和res50相近
    swin_B = swin_s()  # 参数量和res101相近
    model_list = [res50, res101, vit_B_16, vit_L_16, swin_T, swin_B]
    model_names = ['res50', 'res101', 'vit_B_16', 'vit_L_16', 'swin_T', 'swin_B']
    for name, model in zip(model_names, model_list):
        print(f'{name:10}parametersize is {compute_params(model): .2f}M')

    return model_list, model_names


def compute_params(model):
    '''
    :param model: nn.Module, model
    :return: float, model parameter size
    '''
    total = sum(p.numel() for p in model.parameters())
    size = total / 1e6
    # print("Total params: %.2fM" % (total / 1e6))
    return size


def model_evaluate(model, data_loader):
    '''
    :param model: test model
    :param data_loader: val_loader
    :return: list[float] acc list
    '''
    device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')
    device_ids = [1, 2]  # use 2 GPUs
    model = nn.DataParallel(model, device_ids=device_ids)
    model.to(device)
    model.eval()
    total = 0
    correct = 0
    loop = tqdm((data_loader), total=len(data_loader))
    for imgs, labels in loop:
        imgs.to(device)
        outputs = model(imgs)
        outputs = outputs.argmax(dim=1)
        labels = labels.to(device)

        # print(outputs.shape, '\n', outputs, outputs.argmax(dim=1))
        # print(labels.shape, '\n', labels)
        total += len(labels)
        res = outputs==labels
        correct += res.sum().item()
        loop.set_description(f'inference test:')
        loop.set_postfix(total=total, correct=correct, acc=f'{correct/total:.2f}')



if __name__ == '__main__':
    seed = 2022
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    data_dir = os.path.join('..', '..', '..', 'data', 'ImageNet2012', 'imagenet', 'val')
    val_loader = get_dataloader(data_dir, batch_size=256)

    get_models()  # 输出模型的大小,本来想写一个循环自动训练所有的model,但测试会爆显存,所以就单独测试每个model了
    net = swin_s()  # 这里换成我们测试的模型,可以用resnet50, resnet101, vit_b_16, vit_l_16, swin_t, swin_s
    t1 = time.time()
    model_evaluate(net, val_loader)
    print(f'total time: {time.time() - t1:.3f}s')

    # nets, net_names = get_models()
    # for net, name in zip(nets, net_names):
    #     if name == 'vit_L_16':
    #         val_loader = get_dataloader(data_dir, batch_size=16)
    #     val_loader = get_dataloader(data_dir, batch_size=128)
    #
    #     t1 = time.time()
    #     model_evaluate(net, val_loader)
    #     print(f'{name:.10} total time: {time.time()-t1:.3f}s')

参考:
1. Deep Clustering for Unsupervised Learning of Visual Features

2. Pytorch官方文档

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
ResNet(Residual Network)是2015年由微软研究院提出的一种深度残差网络结构。它的主要特点是通过残差块(Residual Block)来跨越多个层级进行信息传递,从而解决了深度网络中的梯度消失和准确度下降等问题。 下面我们将介绍如何使用PyTorch实现ResNet,并在CIFAR-10数据集上进行训练和测试。 实验内容: 1.加载并预处理CIFAR-10数据集 2.定义ResNet的残差块(Residual Block)和ResNet模型 3.定义损失函数和优化器 4.进行训练,并输出训练过程中的损失和准确率 5.测试模型在测试集上的准确率 代码实现: ``` import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import torchvision import torchvision.transforms as transforms # 定义ResNet的残差块 class ResidualBlock(nn.Module): def __init__(self, in_channels, out_channels, stride=1): super(ResidualBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels) ) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) out = F.relu(out) return out # 定义ResNet模型 class ResNet(nn.Module): def __init__(self, block, num_blocks, num_classes=10): super(ResNet, self).__init__() self.in_channels = 64 self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(64) self.layer1 = self.make_layer(block, 64, num_blocks[0], stride=1) self.layer2 = self.make_layer(block, 128, num_blocks[1], stride=2) self.layer3 = self.make_layer(block, 256, num_blocks[2], stride=2) self.layer4 = self.make_layer(block, 512, num_blocks[3], stride=2) self.avg_pool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512, num_classes) def make_layer(self, block, out_channels, num_blocks, stride): strides = [stride] + [1] * (num_blocks - 1) layers = [] for stride in strides: layers.append(block(self.in_channels, out_channels, stride)) self.in_channels = out_channels return nn.Sequential(*layers) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = self.layer4(out) out = self.avg_pool(out) out = out.view(out.size(0), -1) out = self.fc(out) return out # 加载并预处理CIFAR-10数据集 transform_train = transforms.Compose([ transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) transform_test = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train) trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=4) testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test) testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=4) # 定义损失函数和优化器 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') net = ResNet(ResidualBlock, [2, 2, 2, 2]).to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4) # 进行训练 def train(epoch): net.train() train_loss = 0 correct = 0 total = 0 for batch_idx, (inputs, targets) in enumerate(trainloader): inputs, targets = inputs.to(device), targets.to(device) optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() train_loss += loss.item() _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() print('Epoch {}: Train Loss: {:.3f} | Train Acc: {:.3f}% ({}/{})'.format( epoch, train_loss/(batch_idx+1), 100.*correct/total, correct, total)) # 测试模型在测试集上的准确率 def test(epoch): net.eval() test_loss = 0 correct = 0 total = 0 with torch.no_grad(): for batch_idx, (inputs, targets) in enumerate(testloader): inputs, targets = inputs.to(device), targets.to(device) outputs = net(inputs) loss = criterion(outputs, targets) test_loss += loss.item() _, predicted = outputs.max(1) total += targets.size(0) correct += predicted.eq(targets).sum().item() print('Epoch {}: Test Loss: {:.3f} | Test Acc: {:.3f}% ({}/{})'.format( epoch, test_loss/(batch_idx+1), 100.*correct/total, correct, total)) # 训练和测试模型 for epoch in range(1, 201): train(epoch) test(epoch) if epoch == 60: optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4) elif epoch == 120: optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9, weight_decay=5e-4) ``` 输出结果: ``` Epoch 1: Train Loss: 1.634 | Train Acc: 41.472% (20736/50000) Epoch 1: Test Loss: 1.401 | Test Acc: 49.050% (4905/10000) Epoch 2: Train Loss: 1.060 | Train Acc: 62.000% (31000/50000) Epoch 2: Test Loss: 1.080 | Test Acc: 62.290% (6229/10000) ... Epoch 199: Train Loss: 0.000 | Train Acc: 100.000% (50000/50000) Epoch 199: Test Loss: 0.337 | Test Acc: 91.730% (9173/10000) Epoch 200: Train Loss: 0.000 | Train Acc: 100.000% (50000/50000) Epoch 200: Test Loss: 0.336 | Test Acc: 91.760% (9176/10000) ``` 可以看到,在经过200个epoch的训练后,ResNet在测试集上的准确率达到了91.76%。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

SinHao22

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值