2021高通人工智能创新大赛垃圾分类赛题第五次研讨会

39 篇文章 2 订阅
11 篇文章 1 订阅

GT第五次研讨会

一、几点注意事项:

自动测试(OLD)

为了能得到全面公正的竞赛/项目结果,平台通过获取训练得到的模型,以及运行开发者的测试代码进行结果的输出,最后根据输出结果计算评价指标的值,对所有开发者的算法进行排名。再次进入在线编码,在/project/ev_sdk路径下,编写测试代码,即根据比赛/项目的规定,规范化测试代码的输入输出。

code env select

EV_SDK是由本公司自研的用于自动测试和后续模型落地的标准模型接口。为了简化竞赛开发者的开发工作,用于比赛自动测试的SDK经过了简化,并且可以选择C++或者python两种封装方式。详细的封装SDK的方法可以参考在线编码/project/ev_sdk的路径下的README.md文件。

为了尽量简单,在此介绍python方法封装SDK的方法。当使用Python接口发起测试时,系统仅会运行/projetc/ev_sdk/src/ji.py内的代码,用户需要根据自己的模型名称、模型输入输出、模型推理逻辑,修改src/ji.py

**这里展示的是针对resnet18+50一般情况下爱transform的ji.py

  • 实现模型初始化:
 # src/ji.py
def init():
    # 测试时选择的文件名
    pth = '/usr/local/ev_sdk/model/models.pkl'
    model = torch.load(pth) 
    
    return model
  • 实现模型推理:
 # src/ji.py
def process_image(net, input_image, args=None):
    
    img = input_image
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(img)
    #作相应的图像变换,resize,totensor,normalize注意totensor一定要在normalize之前
    norm_mean = [0.485, 0.456, 0.406]
    norm_std = [0.229, 0.224, 0.225]
    pic_transform = transforms.Compose([transforms.Resize(112),transforms.ToTensor(),transforms.Normalize(norm_mean,norm_std)]) 
    img = pic_transform(img)
    img = np.array(img)
    img = img.transpose(0,2,1)
    img = torch.tensor([img])
    img = img.to(device)
    net.eval()
    with torch.no_grad():
        out = net(img)
        print(out)
        _, pred = torch.max(out.data, 1)
        data = json.dumps({'class': class_dict[pred[0].item()]},indent=4)
    return data

其中process_image接口返回值,必须是JSON格式的字符串,并且格式符合要求。

根据实际项目,将结果封装成项目所规定的输入输出格式

示例代码中使用的是目标检测类项目,因此需要根据实际项目,添加检测类别信息:

目前最佳性能分+准确度

image-20210513103916237

新的ji.py

import cv2
import json
import numpy as np
import torch
from skimage import io,transform,color
from PIL import Image
from torchvision import transforms, models
from data_Augmentation import *
# 自己的模型

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def init():
    # 测试时选择的文件名
    pth = '/usr/local/ev_sdk/model/models.pkl'
    model = torch.load(pth) 
    print('load model finish')
    
    return model

# 根据训练的标签设置
class_dict = {}
f = open('/usr/local/ev_sdk/src/class.txt','r')
a = f.read()
class_dict = eval(a)
class_dict = {value:key for key, value in class_dict.items()}
f.close()

def process_image(net, input_image, args=None):
    print('begin proecess')
    transform = get_transforms(input_size=224, test_size=224, backbone=None)
    print('get transform')
    img = input_image
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(img)
    #作相应的图像变换,resize,totensor,normalize注意totensor一定要在normalize之前
    norm_mean = [0.485, 0.456, 0.406]
    norm_std = [0.229, 0.224, 0.225]
#     pic_transform = transforms.Compose([transforms.Resize(112),transforms.ToTensor(),transforms.Normalize(norm_mean,norm_std)])
    pic_transform = transform['val_test']
    img = pic_transform(img)
    img = np.array(img)
    img = img.transpose(0,2,1)
    img = torch.tensor([img])
    img = img.to(device)
    net.eval()
    with torch.no_grad():
        out = net(img)
        print(out)
        _, pred = torch.max(out.data, 1)
        data = json.dumps({'class': class_dict[pred[0].item()]},indent=4)
        print(data)
    return data


if __name__ == '__main__':
    net = init()
    x = cv2.imread('../pic/0.jpg')
    process_image(net, x)

这里主要修改了transform的方式,新的模型更换了transform在下个部分进行讲解;

讲下注意事项

  • 在训练模型时会在前面加上:
model.train()
#在测试模型时在前面使用:
model.eval()
  • 同时发现,如果不写这两个程序也可以运行,这是因为这两个方法是针对在网络训练和测试时采用不同方式的情况,比如Batch Normalization 和Dropout。

  • 使用PyTorch进行训练和测试时一定注意要把实例化的model指定train/eval,eval()时,框架会自动把BN和DropOut固定住,不会取平均,而是用训练好的值,不然的话,一旦test的batch_size过小,很容易就会被BN层导致生成图片颜色失真极大!

Class Inpaint_Network()

......

Model = Inpaint_Nerwoek()

#train:

Model.train(mode=True)

.....
#test:

Model.eval()

训练时是针对每个min-batch的,但是在测试中往往是针对单张图片,即不存在min-batch的概念。由于网络训练完毕后参数都是固定的,因此每个批次的均值和方差都是不变的,因此直接结算所有batch的均值和方差。
所有Batch Normalization的训练和测试时的操作不同。

还有一个重要问题也是我们之前一直结果和随机猜测差不多原因就是我们的normalize部分没有和训练时读入数据一直

最后就是convert_model.sh中的注意点,一定要remove掉未量化二代model.dlc否则量化结果会一直是GPU的版本

二、已尝试的模型

    1. resnet18 epoch50 常规transform

网络主体架构resnet18

train_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)),
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])
  • 关于优化器的选择
optimizer = optim.SGD(resnet18.parameters(), lr=LR, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=lr_decay_step, gamma=0.1)

训练可视化结果图

训练可视化结果图

    1. resnet34 这个本质上是double的resnet18效果略有提升,但是幅度不大,不作具体展示
    1. resnet50 epoch10,加入尝龟数据增强,过拟合非常严重

image-20210513110231697

    1. efficient_net_b4 相比较resnet更轻量化和更好性能的网络架构,具体代码如下,transform采用对应适合的版本,进行了30epoch,没有进行精细的调参暂且只试验了一遍,网络相对复杂,性能分无法打满,早期版本过拟合也是十分严重,这里采用了一个新的损失函数LabelSmoothSoftmaxCE进行了标签平滑
#!/usr/bin/python
# -*- encoding: utf-8 -*-
import torch
import torch.nn as nn
class LabelSmoothSoftmaxCE(nn.Module):
    def __init__(self,
                 lb_pos=0.9,
                 lb_neg=0.005,
                 reduction='mean',
                 lb_ignore=255,
                 ):
        super(LabelSmoothSoftmaxCE, self).__init__()
        self.lb_pos = lb_pos
        self.lb_neg = lb_neg
        self.reduction = reduction
        self.lb_ignore = lb_ignore
        self.log_softmax = nn.LogSoftmax(1)

    def forward(self, logits, label):
        logs = self.log_softmax(logits)
        ignore = label.data.cpu() == self.lb_ignore
        n_valid = (ignore == 0).sum()
        label = label.clone()
        label[ignore] = 0
        lb_one_hot = logits.data.clone().zero_().scatter_(1, label.unsqueeze(1), 1)
        label = self.lb_pos * lb_one_hot + self.lb_neg * (1-lb_one_hot)
        ignore = ignore.nonzero()
        _, M = ignore.size()
        a, *b = ignore.chunk(M, dim=1)
        label[[a, torch.arange(label.size(1)), *b]] = 0

        if self.reduction == 'mean':
            loss = -torch.sum(torch.sum(logs*label, dim=1)) / n_valid
        elif self.reduction == 'none':
            loss = -torch.sum(logs*label, dim=1)
        return loss


if __name__ == '__main__':
    torch.manual_seed(15)
    criteria = LabelSmoothSoftmaxCE(lb_pos=0.9, lb_neg=5e-3)
    net1 = nn.Sequential(
        nn.Conv2d(3, 3, kernel_size=3, stride=2, padding=1),
    )
    net1.cuda()
    net1.train()
    net2 = nn.Sequential(
        nn.Conv2d(3, 3, kernel_size=3, stride=2, padding=1),
    )
    net2.cuda()
    net2.train()

    with torch.no_grad():
        inten = torch.randn(2, 3, 5, 5).cuda()
        lbs = torch.randint(0, 3, [2, 5, 5]).cuda()
        lbs[1, 3, 4] = 255
        lbs[1, 2, 3] = 255
        print(lbs)

    import torch.nn.functional as F
    logits1 = net1(inten)
    logits1 = F.interpolate(logits1, inten.size()[2:], mode='bilinear')
    logits2 = net2(inten)
    logits2 = F.interpolate(logits2, inten.size()[2:], mode='bilinear')

    #  loss1 = criteria1(logits1, lbs)
    loss = criteria(logits1, lbs)
    #  print(loss.detach().cpu())
    loss.backward()

image-20210513111329025

acc.png

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-m9Ei6Hb7-1621562613678)(https://i.loli.net/2021/05/13/OYv7f8HcjsIPXbk.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GMyfTVhw-1621562613678)(https://i.loli.net/2021/05/14/OrCmoWTGifRaX6N.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-X7bLV5Pv-1621562613680)(https://i.loli.net/2021/05/13/g4PhECH8FXZQ5B1.jpg)]

efficient_net.py --(b3 288 输入) 同时尝试了b0 224输入的结果

valid_acc

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ksZaQyRM-1621562613681)(https://i.loli.net/2021/05/14/rl2RNKfzOgdhnPM.png)]

valid_loss

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iiS7dsWZ-1621562613681)(https://i.loli.net/2021/05/14/KOzCnJ9WDNakc71.png)]

train_acc

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nemZD1gA-1621562613682)(https://i.loli.net/2021/05/14/u7XaxSQRoV6E8Zl.png)]

train_loss

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FopePOyi-1621562613682)(https://i.loli.net/2021/05/14/AtjS2OJgh7DikRm.png)]

efficientnet代码

# -*- coding: utf-8 -*-
from torchvision import datasets, transforms
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch import nn
import torch.optim as optim
import argparse
import warnings
import torch.optim.lr_scheduler as lr_scheduler
from torch.utils.data.dataloader import default_collate  # 导入默认的拼接方式
from efficientnet_pytorch import EfficientNet
from trash_dataloader import TrashDataset
from label_smooth import LabelSmoothSoftmaxCE
import os
from trash_dataloader import TrashDataset
from torch.utils.data import DataLoader
from ev_toolkit import plot_tool
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

warnings.filterwarnings("ignore")

#   to the ImageFolder structure
# Number of classes in the dataset
num_classes = 146

# Batch size for training (change depending on how much memory you have)
batch_size = 64  # 批处理尺寸(batch_size)

# Number of epochs to train for
EPOCH = 50

# Flag for feature extracting. When False, we finetune the whole model,
#   when True we only update the reshaped layer params
# feature_extract = True
feature_extract = False
# 超参数设置
pre_epoch = 0  # 定义已经遍历数据集的次数

def my_collate_fn(batch):
    '''
    batch中每个元素形如(data, label)
    '''
    # 过滤为None的数据
    batch = list(filter(lambda x: x[0] is not None, batch))
    if len(batch) == 0: return torch.Tensor()
    return default_collate(batch)  # 用默认方式拼接过滤后的batch数据


# 用Adam优化器
net = EfficientNet.from_pretrained('efficientnet-b0')

num_ftrs = net._fc.in_features
net._fc = nn.Linear(num_ftrs, num_classes)

# 显示网络信息
print(net)

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 训练使用多GPU,测试单GPU
if torch.cuda.device_count() > 1:
    print("Let's use", torch.cuda.device_count(), "GPUs!")
    net = nn.DataParallel(net)

net.to(device)

# Send the model to GPU
net = net.to(device)

norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]
train_transform = transforms.Compose([
    transforms.Resize((112,112)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])

val_transform = transforms.Compose([
    transforms.Resize((112,112)),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])
# Create training and validation datasets
# # 获取数据集文件夹下的类别文件夹名称赋值为type_list
train_dir = '../../../../home/data'

# 构建DataLoder
train_data= TrashDataset(data_dir=train_dir, transform=train_transform)
valid_data = TrashDataset(data_dir=train_dir, transform=val_transform)
train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(dataset=valid_data, batch_size=batch_size)

# 参数设置,使得我们能够手动输入命令行参数,就是让风格变得和Linux命令行差不多
parser = argparse.ArgumentParser(description='PyTorch DeepNetwork Training')
parser.add_argument('--outf', default='./model/model', help='folder to output images and model checkpoints')  # 输出结果保存路径

args = parser.parse_args()
params_to_update = net.parameters()

print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name, param in net.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t", name)
else:
    for name, param in net.named_parameters():
        if param.requires_grad == True:
            print("\t", name)


def main():
    train_curve = list()
    train_acc = list()
    ii = 0
    LR = 1e-3  # 学习率
    best_acc = 0  # 初始化best test accuracy
    print("Start Training, DeepNetwork!")  # 定义遍历数据集的次数


    # criterion
    criterion = LabelSmoothSoftmaxCE()

    # optimizer
    optimizer = optim.Adam(params_to_update, lr=LR, betas=(0.9, 0.999), eps=1e-9)


    # scheduler
    scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.7, patience=3, verbose=True)

    with open("./log/acc.txt", "w") as f:
        with open("./log/log.txt", "w")as f2:
            for epoch in range(pre_epoch, EPOCH):
                # scheduler.step(epoch)

                print('\nEpoch: %d' % (epoch + 1))
                net.train()
                sum_loss = 0.0
                correct = 0.0
                total = 0.0

                for i, data in enumerate(train_loader):
                    # 准备数据
                    length = len(train_loader)

                    input, target = data
                    input, target = input.to(device), target.to(device)

                    # 训练
                    optimizer.zero_grad()
                    # forward + backward
                    output = net(input)
                    loss = criterion(output, target)

                    loss.backward()
                    optimizer.step()

                    # 每训练1个batch打印一次loss和准确率
                    sum_loss += loss.item()
                    
                    
                    _, predicted = torch.max(output.data, 1)
                    total += target.size(0)
                    correct += predicted.eq(target.data).cpu().sum()
                    train_curve.append(loss.item())
                    train_acc.append(correct / total)
                    if (i + 1 + epoch * length) /50 == 0:
                        print('[epoch:%d, iter:%d] Loss: %.03f | Acc: %.3f%% '
                              % (epoch + 1, (i + 1 + epoch * length), sum_loss / (i + 1),
                                 100. * float(correct) / float(total)))
                    f2.write('%03d  %05d |Loss: %.03f | Acc: %.3f%% '
                             % (epoch + 1, (i + 1 + epoch * length), sum_loss / (i + 1),
                                100. * float(correct) / float(total)))
                    f2.write('\n')
                    f2.flush()

                # 每训练完一个epoch测试一下准确率
                print("Waiting Test!")
                with torch.no_grad():
                    correct = 0
                    total = 0
                    for data in valid_loader:
                        net.eval()
                        images, labels = data
                        images, labels = images.to(device), labels.to(device)
                        outputs = net(images)
                        # 取得分最高的那个类 (outputs.data的索引号)
                        _, predicted = torch.max(outputs.data, 1)
                        total += labels.size(0)
                        correct += (predicted == labels).cpu().sum()
                    print('测试分类准确率为:%.3f%%' % (100. * float(correct) / float(total)))
                    acc = 100. * float(correct) / float(total)
                    scheduler.step(acc)

                    # 将每次测试结果实时写入acc.txt文件中
                    if (ii % 1 == 0):
                        print('Saving model......')
#                         torch.save(net, '%s/net_%03d.pth' % (args.outf, epoch + 1))
                    f.write("EPOCH=%03d,Accuracy= %.3f%%" % (epoch + 1, acc))
                    f.write('\n')
                    f.flush()
                    # 记录最佳测试分类准确率并写入best_acc.txt文件中
                    if acc > best_acc:
                        f3 = open("./log/best_acc.txt", "w")
                        f3.write("EPOCH=%d,best_acc= %.3f%%" % (epoch + 1, acc))
                        f3.close()
                        best_acc = acc
            print("Training Finished, TotalEPOCH=%d" % EPOCH)
            path_model = '../models/202159efficient_20epoch/models.pkl'
            torch.save(net, path_model)
            print('{} is save!'.format(path_model))
    return train_curve,train_acc

def save_plot(train_curve,train_acc):
    train_x = list(range(len(train_curve)))
    train_loss = np.array(train_curve)
    train_acc = np.array(train_acc)
    train_iters = len(train_loader)
    fig_loss = plt.figure(figsize = (10,6))
    plt.plot(train_x, train_loss)
    plot_tool.update_plot(name='loss', img=plt.gcf())
    fig_loss.savefig('../result-graphs/loss.png')
    
    fig_acc = plt.figure(figsize = (10,6))
    plt.plot(train_x, train_acc)
    plot_tool.update_plot(name='acc', img=plt.gcf())
    fig_acc = plt.gcf()
    fig_acc.savefig('../result-graphs/acc.png')
    print('acc-loss曲线绘制已完成')

if __name__ == "__main__":
    train_curve,train_acc = main()
    save_plot(train_curve,train_acc)

优化器同上resnet18

    1. 网络主体结构resnext50_32x4d + 改进优化器与数据增强,xt为2016年对残差网络的改进型,其参数量相当于resnet50但是层数和结构融合了inception的并行卷积所以可以等同于resnet101

data_Augmentation.py

#数据增强
import random
import math
import torch

from PIL import Image, ImageOps, ImageFilter
from torchvision import transforms

class Resize(object):
    def __init__(self, size, interpolation=Image.BILINEAR):
        self.size = size
        self.interpolation = interpolation

    def __call__(self, img):
        # padding
        ratio = self.size[0] / self.size[1]
        w, h = img.size
        if w / h < ratio:
            t = int(h * ratio)
            w_padding = (t - w) // 2
            img = img.crop((-w_padding, 0, w+w_padding, h))
        else:
            t = int(w / ratio)
            h_padding = (t - h) // 2
            img = img.crop((0, -h_padding, w, h+h_padding))

        img = img.resize(self.size, self.interpolation)

        return img

class RandomRotate(object):
    def __init__(self, degree, p=0.5):
        self.degree = degree
        self.p = p

    def __call__(self, img):
        if random.random() < self.p:
            rotate_degree = random.uniform(-1*self.degree, self.degree)
            img = img.rotate(rotate_degree, Image.BILINEAR)
        return img

class RandomGaussianBlur(object):
    def __init__(self, p=0.5):
        self.p = p
    def __call__(self, img):
        if random.random() < self.p:
            img = img.filter(ImageFilter.GaussianBlur(
                radius=random.random()))
        return img

def get_train_transform(mean, std, size):
    train_transform = transforms.Compose([
        Resize((int(size * (256 / 224)), int(size * (256 / 224)))),
        transforms.RandomCrop(size),

        transforms.RandomHorizontalFlip(),
        # RandomRotate(15, 0.3),
        # RandomGaussianBlur(),
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std),
    ])
    return train_transform

def get_test_transform(mean, std, size):
    return transforms.Compose([
        Resize((int(size * (256 / 224)), int(size * (256 / 224)))),
        transforms.CenterCrop(size),
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std),
    ])

def get_transforms(input_size=288, test_size=288, backbone=None):
    mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
    if backbone is not None and backbone in ['pnasnet5large', 'nasnetamobile']:
        mean, std = [0.5, 0.5, 0.5], [0.5, 0.5, 0.5]
    transformations = {}
    transformations['train'] = get_train_transform(mean, std, input_size)
    transformations['val'] = get_test_transform(mean, std, test_size)
    return transformations

self_optimizer.py

#新的优化器
import errno
import os
import sys
import time
import math

import torch.nn as nn
import torch.nn.init as init
from torch.autograd import Variable
import torch
import shutil
# import adabound
# from utils.radam import RAdam, AdamW
import torchvision.transforms as transforms
import math
import torch
from torch.optim.optimizer import Optimizer, required


class RAdam(Optimizer):

    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0):
        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
        self.buffer = [[None, None, None] for ind in range(10)]
        super(RAdam, self).__init__(params, defaults)

    def __setstate__(self, state):
        super(RAdam, self).__setstate__(state)

    def step(self, closure=None):

        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:

            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data.float()
                if grad.is_sparse:
                    raise RuntimeError('RAdam does not support sparse gradients')

                p_data_fp32 = p.data.float()

                state = self.state[p]

                if len(state) == 0:
                    state['step'] = 0
                    state['exp_avg'] = torch.zeros_like(p_data_fp32)
                    state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
                else:
                    state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
                    state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)

                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']

                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                exp_avg.mul_(beta1).add_(1 - beta1, grad)

                state['step'] += 1
                buffered = self.buffer[int(state['step'] % 10)]
                if state['step'] == buffered[0]:
                    N_sma, step_size = buffered[1], buffered[2]
                else:
                    buffered[0] = state['step']
                    beta2_t = beta2 ** state['step']
                    N_sma_max = 2 / (1 - beta2) - 1
                    N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
                    buffered[1] = N_sma

                    # more conservative since it's an approximated value
                    if N_sma >= 5:
                        step_size = math.sqrt(
                            (1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (
                                        N_sma_max - 2)) / (1 - beta1 ** state['step'])
                    else:
                        step_size = 1.0 / (1 - beta1 ** state['step'])
                    buffered[2] = step_size

                if group['weight_decay'] != 0:
                    p_data_fp32.add_(-group['weight_decay'] * group['lr'], p_data_fp32)

                # more conservative since it's an approximated value
                if N_sma >= 5:
                    denom = exp_avg_sq.sqrt().add_(group['eps'])
                    p_data_fp32.addcdiv_(-step_size * group['lr'], exp_avg, denom)
                else:
                    p_data_fp32.add_(-step_size * group['lr'], exp_avg)

                p.data.copy_(p_data_fp32)

        return loss


class PlainRAdam(Optimizer):

    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0):
        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)

        super(PlainRAdam, self).__init__(params, defaults)

    def __setstate__(self, state):
        super(PlainRAdam, self).__setstate__(state)

    def step(self, closure=None):

        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:

            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data.float()
                if grad.is_sparse:
                    raise RuntimeError('RAdam does not support sparse gradients')

                p_data_fp32 = p.data.float()

                state = self.state[p]

                if len(state) == 0:
                    state['step'] = 0
                    state['exp_avg'] = torch.zeros_like(p_data_fp32)
                    state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
                else:
                    state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
                    state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)

                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']

                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                exp_avg.mul_(beta1).add_(1 - beta1, grad)

                state['step'] += 1
                beta2_t = beta2 ** state['step']
                N_sma_max = 2 / (1 - beta2) - 1
                N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)

                if group['weight_decay'] != 0:
                    p_data_fp32.add_(-group['weight_decay'] * group['lr'], p_data_fp32)

                # more conservative since it's an approximated value
                if N_sma >= 5:
                    step_size = group['lr'] * math.sqrt(
                        (1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (
                                    N_sma_max - 2)) / (1 - beta1 ** state['step'])
                    denom = exp_avg_sq.sqrt().add_(group['eps'])
                    p_data_fp32.addcdiv_(-step_size, exp_avg, denom)
                else:
                    step_size = group['lr'] / (1 - beta1 ** state['step'])
                    p_data_fp32.add_(-step_size, exp_avg)

                p.data.copy_(p_data_fp32)

        return loss


class AdamW(Optimizer):

    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0, warmup=0):
        defaults = dict(lr=lr, betas=betas, eps=eps,
                        weight_decay=weight_decay, warmup=warmup)
        super(AdamW, self).__init__(params, defaults)

    def __setstate__(self, state):
        super(AdamW, self).__setstate__(state)

    def step(self, closure=None):
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:

            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data.float()
                if grad.is_sparse:
                    raise RuntimeError('Adam does not support sparse gradients, please consider SparseAdam instead')

                p_data_fp32 = p.data.float()

                state = self.state[p]

                if len(state) == 0:
                    state['step'] = 0
                    state['exp_avg'] = torch.zeros_like(p_data_fp32)
                    state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)
                else:
                    state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
                    state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)

                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']

                state['step'] += 1

                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                exp_avg.mul_(beta1).add_(1 - beta1, grad)

                denom = exp_avg_sq.sqrt().add_(group['eps'])
                bias_correction1 = 1 - beta1 ** state['step']
                bias_correction2 = 1 - beta2 ** state['step']

                if group['warmup'] > state['step']:
                    scheduled_lr = 1e-8 + state['step'] * group['lr'] / group['warmup']
                else:
                    scheduled_lr = group['lr']

                step_size = scheduled_lr * math.sqrt(bias_correction2) / bias_correction1

                if group['weight_decay'] != 0:
                    p_data_fp32.add_(-group['weight_decay'] * scheduled_lr, p_data_fp32)

                p_data_fp32.addcdiv_(-step_size, exp_avg, denom)

                p.data.copy_(p_data_fp32)

        return loss


__all__ = ['get_mean_and_std', 'init_params', 'mkdir_p', 'AverageMeter', 'get_optimizer', 'save_checkpoint']


def get_mean_and_std(dataset):
    '''Compute the mean and std value of dataset.'''
    dataloader = trainloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True, num_workers=2)

    mean = torch.zeros(3)
    std = torch.zeros(3)
    print('==> Computing mean and std..')
    for inputs, targets in dataloader:
        for i in range(3):
            mean[i] += inputs[:,i,:,:].mean()
            std[i] += inputs[:,i,:,:].std()
    mean.div_(len(dataset))
    std.div_(len(dataset))
    return mean, std

def init_params(net):
    '''Init layer parameters.'''
    for m in net.modules():
        if isinstance(m, nn.Conv2d):
            init.kaiming_normal(m.weight, mode='fan_out')
            if m.bias:
                init.constant(m.bias, 0)
        elif isinstance(m, nn.BatchNorm2d):
            init.constant(m.weight, 1)
            init.constant(m.bias, 0)
        elif isinstance(m, nn.Linear):
            init.normal(m.weight, std=1e-3)
            if m.bias:
                init.constant(m.bias, 0)

def mkdir_p(path):
    '''make dir if not exist'''
    try:
        os.makedirs(path)
    except OSError as exc:  # Python >2.5
        if exc.errno == errno.EEXIST and os.path.isdir(path):
            pass
        else:
            raise

class AverageMeter(object):
    """Computes and stores the average and current value
       Imported from https://github.com/pytorch/examples/blob/master/imagenet/main.py#L247-L262
    """
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

def get_optimizer(model, args):
    parameters = []
    for name, param in model.named_parameters():
        if 'fc' in name or 'class' in name or 'last_linear' in name or 'ca' in name or 'sa' in name:
            parameters.append({'params': param, 'lr': args.lr * args.lr_fc_times})
        else:
            parameters.append({'params': param, 'lr': args.lr})

    if args.optimizer == 'sgd':
        return torch.optim.SGD(parameters,
                            # model.parameters(),
                               args.lr,
                               momentum=args.momentum, nesterov=args.nesterov,
                               weight_decay=args.weight_decay)
    elif args.optimizer == 'rmsprop':
        return torch.optim.RMSprop(parameters,
                                # model.parameters(),
                                   args.lr,
                                   alpha=args.alpha,
                                   weight_decay=args.weight_decay)
    elif args.optimizer == 'adam':
        return torch.optim.Adam(parameters,
                                # model.parameters(),
                                args.lr,
                                betas=(args.beta1, args.beta2),
                                weight_decay=args.weight_decay)
    elif args.optimizer == 'AdaBound':
        return adabound.AdaBound(parameters,
                                # model.parameters(),
                                lr=args.lr, final_lr=args.final_lr)
    elif args.optimizer == 'radam':
        return RAdam(parameters, lr=args.lr, betas=(args.beta1, args.beta2),
                          weight_decay=args.weight_decay)

    else:
        raise NotImplementedError


def save_checkpoint(state, is_best, single=True, checkpoint='checkpoint', filename='checkpoint.pth.tar'):
    if single:
        fold = ''
    else:
        fold = str(state['fold']) + '_'
    cur_name = 'checkpoint.pth.tar'
    filepath = os.path.join(checkpoint, fold + cur_name)
    curpath = os.path.join(checkpoint, fold + 'model_cur.pth')

    torch.save(state, filepath)
    torch.save(state['state_dict'], curpath)

    if is_best and state['epoch'] >= 5:
        model_name = 'model_' + str(state['epoch']) + '_' + str(int(round(state['train_acc']*100, 0))) + '_' + str(int(round(state['acc']*100, 0))) + '.pth'
        model_path = os.path.join(checkpoint, fold + model_name)
        torch.save(state['state_dict'], model_path)


def save_checkpoint2(state, is_best, checkpoint='checkpoint', filename='checkpoint.pth.tar'):
    # best_model = '/application/search/qlmx/clover/garbage/code/image_classfication/predict/'
    fold = str(state['fold']) + '_'
    filepath = os.path.join(checkpoint, fold + filename)
    model_path = os.path.join(checkpoint, fold + 'model_cur.pth')

    torch.save(state, filepath)
    torch.save(state['state_dict'], model_path)
    if is_best:
        shutil.copyfile(filepath, os.path.join(checkpoint, fold + 'model_best.pth.tar'))
        shutil.copyfile(model_path, os.path.join(checkpoint, fold + 'model_best.pth'))
  • 该版本还有一个修改,就是重新切分出了训练集和验证集,注意需要在start_train.sh中加入对数据集预处理代码preprocess.py的运行命令

preprocess.py

# 工具类
import os
import random
import shutil
from shutil import copy2


def data_set_split(src_data_folder, target_data_folder, train_scale=0.9, val_scale=0.1):
    '''
    读取源数据文件夹,生成划分好的文件夹,分为trian、val、test三个文件夹进行
    :param src_data_folder: 源文件夹 E:/biye/gogogo/note_book/torch_note/data/utils_test/data_split/src_data
    :param target_data_folder: 目标文件夹 E:/biye/gogogo/note_book/torch_note/data/utils_test/data_split/target_data
    :param train_scale: 训练集比例
    :param val_scale: 验证集比例
    :param test_scale: 测试集比例
    :return:
    '''
    print("开始数据集划分")
    class_names = os.listdir(src_data_folder)
    # 在目标目录下创建文件夹
    split_names = ['train', 'val']
    for split_name in split_names:
        split_path = os.path.join(target_data_folder, split_name)
        if os.path.isdir(split_path):
            pass
        else:
            os.mkdir(split_path)
        # 然后在split_path的目录下创建类别文件夹
        for class_name in class_names:
            class_split_path = os.path.join(split_path, class_name)
            if os.path.isdir(class_split_path):
                pass
            else:
                os.mkdir(class_split_path)

    # 按照比例划分数据集,并进行数据图片的复制
    # 首先进行分类遍历
    for class_name in class_names:
        current_class_data_path = os.path.join(src_data_folder, class_name)
        current_all_data = os.listdir(current_class_data_path)
        current_data_length = len(current_all_data)
        current_data_index_list = list(range(current_data_length))
        random.shuffle(current_data_index_list)

        train_folder = os.path.join(os.path.join(target_data_folder, 'train'), class_name)
        val_folder = os.path.join(os.path.join(target_data_folder, 'val'), class_name)
        train_stop_flag = current_data_length * train_scale
        val_stop_flag = current_data_length * (train_scale + val_scale)
        current_idx = 0
        train_num = 0
        val_num = 0
        for i in current_data_index_list:
            src_img_path = os.path.join(current_class_data_path, current_all_data[i])
            if current_idx <= train_stop_flag:
                copy2(src_img_path, train_folder)
                # print("{}复制到了{}".format(src_img_path, train_folder))
                train_num = train_num + 1
            elif (current_idx > train_stop_flag) and (current_idx <= val_stop_flag):
                copy2(src_img_path, val_folder)
                # print("{}复制到了{}".format(src_img_path, val_folder))
                val_num = val_num + 1

            current_idx = current_idx + 1

        print("*********************************{}*************************************".format(class_name))
        print("训练集{}:{}张".format(train_folder, train_num))
        print("验证集{}:{}张".format(val_folder, val_num))


if __name__ == '__main__':
    base_dir = "../../../../home/data"
    num = os.listdir(base_dir)
    src_data_folder = os.path.join(base_dir,num[0])
    os.makedirs("./split_data/",exist_ok=True)
    target_data_folder = "./split_data/"
    data_set_split(src_data_folder, target_data_folder)
    1. 暂未测试已有代码的模型Resnet+注意力机制 - - - CBAM

注意点:因为不能改变ResNet的网络结构,所以CBAM不能加在block里面(也可以加在block里面的,但是此时预训练参数就不能用了),因为加进去网络结构发生了变化,所以不能用预训练参数。加在最后一层卷积和第一层卷积不改变网络,可以用预训练参数。

import torch.nn as nn
import math
try:
    from torch.hub import load_state_dict_from_url
except ImportError:
    from torch.utils.model_zoo import load_url as load_state_dict_from_url
import torch

## 通道注意力机制
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1   = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)
    
## 空间注意力机制
class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1

        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
                 groups=1, width_per_group=64, replace_stride_with_dilation=None,
                 norm_layer=None):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)

        # 网络的第一层加入注意力机制
        self.ca = ChannelAttention(self.inplanes)
        self.sa = SpatialAttention()

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2])
        # 网络的卷积层的最后一层加入注意力机制
        self.ca1 = ChannelAttention(self.inplanes)
        self.sa1 = SpatialAttention()

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.ca(x) * x
        x = self.sa(x) * x

        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.ca1(x) * x
        x = self.sa1(x) * x


        x = self.avgpool(x)
        x = x.reshape(x.size(0), -1)
        x = self.fc(x)

        return x

三、数据不平衡的解决方案

640?wx_fmt=png

import os
import random
import torch
from torch.utils.data import Dataset
import torchvision
from PIL import Image
class MedicalDataset(Dataset):
	def __init__(self, root, split, data_ratio=1.0, ret_name=False):
		assert split in ['train', 'val', 'test']
		self.ret_name = ret_name
		self.cls_to_ind_dict = dict()
		self.ind_to_cls_dict = list()
		self.img_list = list()
		self.cls_list = list()
		self.cls_num = dict()
		classes = ['WA', 'WKY']
		if split=='test':
			for idx, cls in enumerate(classes):
				self.cls_to_ind_dict[cls] = idx
				self.ind_to_cls_dict.append(cls)
				img_list = sorted(os.listdir(os.path.join(root, split, cls)))
				self.cls_num[cls] = len(img_list)
				for img_fp in img_list:
					self.img_list.append(os.path.join(root, split, cls, img_fp))
					self.cls_list.append(idx)
		else:
			img_list_temp, cls_list_temp = [],[]
				for idx, cls in enumerate(classes):
					self.cls_to_ind_dict[cls] = idx
					self.ind_to_cls_dict.append(cls)
					if cls == 'WA':                 #WA的训练集数量不用扩
						img_list = sorted(os.listdir(os.path.join(root, split, cls)))
						self.cls_num[cls] = len(img_list)
						for img_fp in img_list:
							self.img_list.append(os.path.join(root, split, cls, img_fp))
							self.cls_list.append(idx)
                        print(cls, '=======================')
                        print(len(self.img_list), len(self.cls_list))  
					else:
                        img_list = sorted(os.listdir(os.path.join(root, split, cls)))  
                        for img_fp in img_list:
                            img_list_temp.append(os.path.join(root, split, cls, img_fp))
                            cls_list_temp.append(idx)
                            img_list_temp = [val for val in img_list_temp for i in range(3)]   #将原来的img_list重复三遍

                            cls_list_temp = [val for val in cls_list_temp for i in range(3)] 

                            self.cls_num[cls] = len(img_list_temp)              #记录每个类别的新数目
                        print(cls, '=======================')
                        print(len(img_list_temp), len(cls_list_temp))

                self.img_list = self.img_list + img_list_temp
                self.cls_list = self.cls_list + cls_list_temp
                print(len(self.img_list), len(self.cls_list))   
         # 强制水平翻转
         self.trans0 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                          torchvision.transforms.RandomCrop(224),
                                                          torchvision.transforms.RandomHorizontalFlip(p=1),
                                                          torchvision.transforms.ToTensor(),
                                                          torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                           [0.229, 0.224, 0.225])
                                                          ])
        # 强制垂直翻转
         self.trans1 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.RandomVerticalFlip(p=1),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
         # 旋转-90~90
         self.trans2 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.RandomRotation(90),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
 
         # 亮度在0-2之间增强,0是原图
         self.trans3 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.ColorJitter(brightness=1),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
         # 修改对比度,0-2之间增强,0是原图
         self.trans4 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.ColorJitter(contrast=2),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
         # 颜色变化
         self.trans5 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.ColorJitter(hue=0.5),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
         # 混合
         self.trans6 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.ColorJitter(brightness=1, contrast=2, hue=0.5),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
         self.trans_list = [self.trans0, self.trans1, self.trans2, self.trans3, self.trans4, self.trans5, self.trans6]
 
         
 
     def __getitem__(self, index):
         name = self.img_list[index]
         img = Image.open(name)
         num = random.randint(0, 6)
         img = self.trans_list[num](img)
         label = self.cls_list[index]
         if self.ret_name:
             return img, label, name
         else:
             return img, label
 
     def __len__(self):
         return len(self.img_list)





         # 混合
         self.trans6 = torchvision.transforms.Compose([torchvision.transforms.Resize(256),
                                                           torchvision.transforms.RandomCrop(224),
                                                           torchvision.transforms.ColorJitter(brightness=1, contrast=2, hue=0.5),
                                                           torchvision.transforms.ToTensor(),
                                                           torchvision.transforms.Normalize([0.485, 0.456, 0.406],
                                                                                            [0.229, 0.224, 0.225])
                                                           ])
         self.trans_list = [self.trans0, self.trans1, self.trans2, self.trans3, self.trans4, self.trans5, self.trans6]
 
         
 
     def __getitem__(self, index):
         name = self.img_list[index]
         img = Image.open(name)
         num = random.randint(0, 6)
         img = self.trans_list[num](img)
         label = self.cls_list[index]
         if self.ret_name:
             return img, label, name
         else:
             return img, label
 
     def __len__(self):
         return len(self.img_list)

四、关于进一步的规划

    1. 下周四是比赛截止日期5月20日
    1. 目前考虑结合较为完善的labelsmooth技术和注意力机制,在不导入预训练模型的情况开展训练,网络主体框架使用resnext50;
    1. 根据自动测试的错误分类结果对应数据的各个类别的分布逐个排查在那些类别上有问题,目前能够直观的发现的就是太阳能电池板一类的垃圾图片无法正确识别;这一块需要针对性的寻找对不均衡样本的采样方法来改善其准确率和召回率
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值