pytorch实现yolov4_v1（数据处理+训练测试+转模型）

最新推荐文章于 2024-06-08 23:49:02 发布

尼古拉斯·two_dog

最新推荐文章于 2024-06-08 23:49:02 发布

阅读量1.8k

点赞数 1

分类专栏：深度学习文章标签： yolov4

本文链接：https://blog.csdn.net/gm_Ergou/article/details/118599118

版权

深度学习专栏收录该内容

28 篇文章 4 订阅

订阅专栏

参考链接：

https://blog.csdn.net/qq_44876051/article/details/107665310?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_baidulandingword-2&spm=1001.2101.3001.4242

https://www.cnblogs.com/wujianming-110117/p/13845974.html

pytorch代码实现：https://github.com/bubbliiiing/yolov4-pytorch

主要修改点：
1. 修改upsample算子的实现，使用interpolate代替（修改yolo4.py）
2. 修改模型权重加载方法，排除upsample的权重加载，否则会报错（修改train.py）

模型训练+推理步骤：
1. 下载代码和预训练模型，准备数据
2. 数据预处理：使用json2xml.py、kmeans_for_anchors.py、voc2yolo4.py和voc_annotation.py对自己标注的数据进行处理
3. 训练模型+测试模型（修改train.py和predict.py）
4. 计算map（修改get_dr_txt.py、get_gt_txt.py、get_map.py，需要注意数据集的格式）
5. pytorch模型转onnx模型（修改pt2onnx.py）
6. 测试pth模型+onnx模型（修改test_pth.py、test_onnx.py）
7. onnx转om模型（atc命令如下）
8. 测试om模型（修改pyacl代码，不使用等比例缩放）
9. 对比结果（把om模型的输出拿出来，放到test_om.py / test_om2.py中测试，对比本地模型和atlas模型的结果）
备注：test_om.py输入的是om模型经过dvpp+aipp的输入数据，test_om2.py输入的是三个feature_map。

结果对比结论：
1. 本地对比了pytorch和onnx模型，结果保持一致
2. 把om模型的输入截出来，当做pth或onnx模型的输入，得到的结果和om模型的真实输出结果相差不大，证明om模型转换成功
3.om模型结果和本地pth或onnx模型结果相差不大，证明模型迁移成功

一、数据处理

1.数据处理步骤：

数据标注 -> yolo格式转换 -> 计算anchors

具体代码步骤参考上一篇文章：https://blog.csdn.net/gm_Ergou/article/details/118570318

2.数据提供：（25张硬币数据集）

train.txt

data/dataset2/coins/P00524-151911.jpg 1660,402,2145,894,0 2546,714,3025,1205,0 2929,1121,3408,1612,0 2175,1205,2642,1666,0 1840,2091,2307,2552,0 1444,870,1929,1349,1 2642,1923,3037,2331,1 941,1810,1301,2175,2 1762,1301,2139,1660,1
data/dataset2/coins/P00524-151918.jpg 923,534,1301,894,0 2786,343,3151,726,0 1882,798,2235,1157,0 1888,1666,2265,2043,0 2666,1780,3025,2139,0 2331,2001,2666,2355,0 1013,1450,1325,1768,1 989,2432,1277,2696,2 2738,965,3025,1253,1
data/dataset2/coins/P00524-151929.jpg 2678,295,3073,714,0 2211,558,2618,965,0 2450,1109,2822,1492,0 3265,1444,3648,1804,0 989,965,1373,1361,0 774,1552,1157,1911,0 1085,1995,1420,2313,0 1666,2121,2007,2450,0 1935,349,2247,678,1 1253,2355,1540,2618,1 1756,1540,2043,1792,2
data/dataset2/coins/P00524-151944.jpg 1444,355,1828,762,0 2402,343,2786,756,0 690,678,1085,1061,0 941,1193,1283,1540,0 1504,1265,1852,1588,0 1947,1241,2307,1564,0 2103,1660,2450,1959,0 989,2115,1283,2402,0 1756,2546,1995,2768,1 1001,2391,1277,2600,1 3049,917,3325,1193,2 1666,2091,1911,2307,2
data/dataset2/coins/P00524-151953.jpg 822,810,1205,1211,0 1283,642,1684,1037,0 1636,1049,2019,1397,0 1349,1349,1732,1660,0 1205,1911,1516,2211,0 1947,2097,2259,2379,0 2594,1067,2953,1420,0 2558,1738,2905,2049,0 3145,1349,3456,1660,1 774,2319,1049,2546,1 2450,630,2750,929,2 2211,1426,2474,1684,2
data/dataset2/coins/P00524-152004.jpg 1696,223,2145,702,0 2810,684,3289,1133,0 1301,1444,1684,1834,0 834,1522,1253,1899,0 1349,2391,1684,2714,0 2127,1013,2474,1349,1 630,1205,953,1540,1 2127,1468,2414,1744,2 1043,1888,1337,2139,2
data/dataset2/coins/P00524-152030.jpg 1211,732,1618,1073,0 2858,588,3229,917,0 2498,852,2888,1181,0 2187,1187,2618,1540,0 1756,1205,2175,1564,0 923,1540,1385,1971,0 666,2271,1085,2720,1 2043,798,2379,1061,1 1684,1995,2031,2367,2 3169,1492,3504,1792,2
data/dataset2/coins/P00524-152038.jpg 1780,355,2151,702,0 2714,726,3085,1085,0 1738,1229,2115,1612,0 2379,1876,2768,2283,0 1420,1876,1804,2259,0 1373,1397,1732,1762,0 1109,1235,1420,1540,1 2502,2299,2806,2635,1 2630,1229,2911,1504,2 2187,2385,2474,2690,2
data/dataset2/coins/P00524-152052.jpg 1977,582,2438,1025,0 2744,564,3193,1013,0 3241,1043,3708,1516,0 2570,989,3037,1432,0 2840,1738,3349,2235,0 2546,2241,3073,2768,0 1095,1307,1478,1684,1 1636,1289,2007,1684,1 1385,2067,1768,2474,2 2355,1756,2702,2097,2
data/dataset2/coins/P00524-152108.jpg 2235,229,2642,612,0 2534,702,2983,1109,0 2630,1193,3097,1636,0 2067,1426,2534,1876,0 2031,810,2474,1229,0 1929,2211,2402,2696,0 1522,612,1864,941,1 1636,1516,1995,1876,1 1636,1121,1947,1432,2 2474,1732,2810,2055,2
data/dataset2/coins/P00524-152122.jpg 708,714,1205,1091,0 1043,995,1540,1397,0 2786,852,3289,1259,0 1402,2163,2043,2822,0 2289,1732,2870,2283,0 2786,2067,3408,2690,0 2091,1277,2522,1636,1 756,1756,1229,2199,1 1756,738,2091,995,2 1612,1492,2019,1852,2
data/dataset2/coins/P00524-152144.jpg 438,1349,786,1732,0 1277,1109,1672,1468,0 1450,612,1828,947,0 1708,1061,2091,1444,0 1097,1804,1540,2235,0 2582,1636,3055,2091,0 2163,1013,2510,1307,1 941,1456,1259,1780,1 1684,1588,1983,1899,2 1947,1852,2265,2187,2
data/dataset2/coins/P00524-152155.jpg 1528,738,1953,1115,0 2870,941,3325,1337,0 2762,1426,3253,1876,0 2426,2295,2953,2840,0 1714,1379,2163,1804,0 1385,1732,1840,2187,0 810,1552,1163,1899,1 2295,1420,2666,1768,1 1474,253,1780,498,2 2163,307,2474,558,2
data/dataset2/coins/P00524-152208.jpg 822,498,1253,798,0 1253,684,1732,989,0 2450,965,2941,1325,0 989,1145,1516,1540,0 1756,1474,2313,1953,0 2019,1947,2666,2546,0 1157,1762,1684,2211,1 2834,1792,3337,2265,1 1720,606,2043,798,2 2073,714,2414,965,2
data/dataset2/coins/P00524-152213.jpg 678,2139,1325,2726,0 2379,1738,2929,2247,0 2894,1379,3397,1810,0 2169,1139,2648,1516,0 2534,702,2953,995,0 1504,618,1923,894,0 1337,1995,1828,2450,1 2642,1085,3001,1349,1 2283,822,2594,1061,2 1522,438,1804,606,2
data/dataset2/coins/P00524-152222.jpg 2365,544,2748,874,0 1025,1636,1420,2007,0 1660,1642,2043,2043,0 2283,1444,2690,1834,0 2570,1816,2977,2235,0 1690,2343,2115,2786,0 1876,1253,2187,1564,1 965,2313,1307,2666,1 1397,391,1684,630,2 1019,870,1277,1115,2
data/dataset2/coins/P00524-152244.jpg 1426,870,1852,1307,0 2426,564,2840,995,0 2690,1115,3121,1540,0 1402,1624,1804,2043,0 1905,1971,2331,2402,0 1814,1546,2149,1882,1 1331,2187,1684,2546,1 2498,1444,2816,1756,2 1133,1492,1450,1804,2
data/dataset2/coins/P00524-152253.jpg 1660,235,2031,630,0 2450,929,2810,1307,0 750,1516,1133,1888,0 1408,1397,1804,1780,0 2600,1301,2983,1666,0 2570,2385,2935,2786,0 1971,1642,2283,1947,1 2726,1828,3037,2139,1 1636,1145,1911,1432,2 2283,2067,2558,2343,2
data/dataset2/coins/P00524-152300.jpg 2331,379,2696,750,0 2199,756,2570,1109,0 1852,929,2211,1289,0 1540,1163,1899,1516,0 1109,1468,1498,1834,0 1899,1923,2265,2289,0 2846,1277,3157,1564,1 1911,1379,2211,1660,1 3121,1852,3385,2139,2 1283,1816,1564,2091,2
data/dataset2/coins/P00524-152329.jpg 1792,211,2223,654,0 1115,989,1570,1444,0 971,1684,1432,2175,0 1804,1612,2259,2073,0 2762,1552,3217,1977,0 2271,2402,2690,2840,0 2067,1109,2426,1468,1 1355,1397,1732,1756,1 2582,1253,2911,1588,2 2624,2019,2989,2343,2
data/dataset2/coins/P00524-152335.jpg 1720,247,2151,678,0 1133,1253,1540,1708,0 1780,1301,2187,1732,0 2259,1696,2666,2139,0 1037,1899,1450,2355,0 1876,2247,2307,2678,0 2139,1025,2480,1349,1 1516,1995,1882,2331,1 534,1349,846,1684,2 2917,1708,3265,2031,2
data/dataset2/coins/P00524-152343.jpg 1738,349,2139,750,0 810,654,1253,1085,0 582,1474,1061,1923,0 1840,1408,2271,1858,0 1642,1995,2115,2474,0 3289,1492,3702,1905,0 2163,905,2480,1229,1 1241,1061,1612,1408,1 953,1929,1325,2283,2 2576,870,2894,1205,2
data/dataset2/coins/P00524-152351.jpg 1259,343,1672,798,0 2642,582,3025,1013,0 3181,1504,3576,1882,0 2283,1576,2666,1971,0 1301,1313,1714,1744,0 397,2283,822,2690,0 1073,899,1444,1253,1 2025,1061,2367,1397,1 3325,1181,3624,1468,2 1720,1546,2019,1852,2
data/dataset2/coins/P00524-152401.jpg 1253,486,1684,923,0 2103,462,2546,894,0 1450,894,1876,1325,0 1947,1133,2367,1570,0 1905,1600,2337,2019,0 965,1905,1397,2355,0 1037,882,1397,1235,1 1037,1528,1402,1876,1 3181,630,3492,965,2 1540,2199,1858,2546,2
data/dataset2/coins/P00524-152409.jpg 1163,564,1660,1061,0 1923,103,2355,570,0 2313,349,2696,822,0 2666,804,3013,1241,0 965,1756,1474,2283,0 2534,1564,2905,2007,0 582,510,1037,965,1 1504,1444,1876,1828,1 2259,2211,2570,2546,2 3001,582,3265,894,2

coco_anchors.names

12, 16,  19, 36,  40, 28,  36, 75,  76, 55,  72, 146,  142, 110,  192, 243,  459, 401

coins.names

1yuan
5jiao
1jiao

测试图片：从数据集中抽出来的

二、模型训练

1.train.py

# -*- coding: utf-8 -*-  
#-------------------------------------#
#       对数据集进行训练
#-------------------------------------#
import os
import time

import numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from tqdm import tqdm

from yolo4 import YoloBody
from yolo_training import Generator, YOLOLoss
from dataloader import YoloDataset, yolo_dataset_collate


#---------------------------------------------------#
#   获得类和先验框
#---------------------------------------------------#
def get_classes(classes_path):
    '''loads the classes'''
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    '''loads the anchors from a file'''
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    return np.array(anchors).reshape([-1,3,2])[::-1,:,:]

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

        
def fit_one_epoch(net,yolo_losses,epoch,epoch_size,epoch_size_val,gen,genval,Epoch,cuda):
    total_loss = 0
    val_loss = 0

    net.train()
    with tqdm(total=epoch_size,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
        for iteration, batch in enumerate(gen):
            if iteration >= epoch_size:
                break
            images, targets = batch[0], batch[1]
            with torch.no_grad():
                if cuda:
                    images = Variable(torch.from_numpy(images).type(torch.FloatTensor)).cuda()
                    targets = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets]
                else:
                    images = Variable(torch.from_numpy(images).type(torch.FloatTensor))
                    targets = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets]

            #----------------------#
            #   清零梯度
            #----------------------#
            optimizer.zero_grad()
            #----------------------#
            #   前向传播
            #----------------------#
            outputs = net(images)
            losses = []
            num_pos_all = 0
            #----------------------#
            #   计算损失
            #----------------------#
            for i in range(3):
                loss_item, num_pos = yolo_losses[i](outputs[i], targets)
                losses.append(loss_item)
                num_pos_all += num_pos

            loss = sum(losses) / num_pos_all
            #----------------------#
            #   反向传播
            #----------------------#
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            pbar.set_postfix(**{'total_loss': total_loss / (iteration + 1), 'lr'        : get_lr(optimizer)})
            pbar.update(1)


    net.eval()
    print('Start Validation')
    with tqdm(total=epoch_size_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
        for iteration, batch in enumerate(genval):
            if iteration >= epoch_size_val:
                break
            images_val, targets_val = batch[0], batch[1]

            with torch.no_grad():
                if cuda:
                    images_val = Variable(torch.from_numpy(images_val).type(torch.FloatTensor)).cuda()
                    targets_val = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets_val]
                else:
                    images_val = Variable(torch.from_numpy(images_val).type(torch.FloatTensor))
                    targets_val = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets_val]
                optimizer.zero_grad()
                outputs = net(images_val)
                losses = []
                num_pos_all = 0
                for i in range(3):
                    loss_item, num_pos = yolo_losses[i](outputs[i], targets_val)
                    losses.append(loss_item)
                    num_pos_all += num_pos
                loss = sum(losses) / num_pos_all
                val_loss += loss.item()
            pbar.set_postfix(**{'total_loss': val_loss / (iteration + 1)})
            pbar.update(1)
    print('Finish Validation')
    print('Epoch:'+ str(epoch+1) + '/' + str(Epoch))
    print('Total Loss: %.4f || Val Loss: %.4f ' % (total_loss/(epoch_size+1),val_loss/(epoch_size_val+1)))

    if (epoch+1)%20==0:
        print('Saving state, iter:', str(epoch+1))
        # torch.save(model.state_dict(), 'data/model3/Epoch%d-Total_Loss%.4f-Val_Loss%.4f.pth'%((epoch+1),total_loss/(epoch_size+1),val_loss/(epoch_size_val+1)))
        torch.save(model, 'data/model1/Epoch%d-Total_Loss%.4f-Val_Loss%.4f.pth'%((epoch+1),total_loss/(epoch_size+1),val_loss/(epoch_size_val+1)))



if __name__ == "__main__":
    Cuda = False
    #   Dataloder的使用
    Use_Data_Loader = True
    normalize = False
    input_shape = (416,416)
    anchors_path = 'data/dataset2/coco_anchors.names'
    classes_path = 'data/dataset2/coins.names'

    #   获取classes和anchor
    class_names = get_classes(classes_path)
    anchors = get_anchors(anchors_path)
    num_classes = len(class_names)
    print("class_num", num_classes)
    
    #------------------------------------------------------#
    mosaic = False  # mosaic 马赛克数据增强, 实际测试时mosaic数据增强并不稳定，所以默认为False
    Cosine_lr = False  # Cosine_scheduler 余弦退火学习率 True or False
    smoooth_label = 0  # label_smoothing 标签平滑 0.01以下一般 如0.01、0.005

    #------------------------------------------------------#
    model_path = "data/model/yolo4_weights.pth"
    print('Loading weights into state dict...')

    model = YoloBody(len(anchors[0]), num_classes)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model_dict = model.state_dict()
    pretrained_dict = torch.load(model_path, map_location=device)
    # 因为换了upsample，权重加载时会找不到节点，因此这里需要排除upsample的权重加载
    pretrained_dict = {k: v for k, v in pretrained_dict.items() if k.find('upsample')==-1 if np.shape(model_dict[k]) ==  np.shape(v)}
    # #原始加载权重方法
    # pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) ==  np.shape(v)}
    model_dict.update(pretrained_dict)
    model.load_state_dict(model_dict)
    # print(model)

    net = model.train()
    if Cuda:
        net = torch.nn.DataParallel(model)
        cudnn.benchmark = True
        net = net.cuda()

    # 建立loss函数
    yolo_losses = []
    for i in range(3):
        yolo_losses.append(YOLOLoss(np.reshape(anchors,[-1,2]),num_classes, \
                                (input_shape[1], input_shape[0]), smoooth_label, Cuda, normalize))

    #-----------------dataset------------------------#
    annotation_path = 'data/dataset2/coins/train.txt'
    val_split = 0.1
    with open(annotation_path) as f:
        lines = f.readlines()
    np.random.seed(10101)
    np.random.shuffle(lines)
    np.random.seed(None)
    num_val = int(len(lines)*val_split)
    num_train = len(lines) - num_val
    

    #------------------------------------------------------#
    #   主干特征提取网络特征通用，冻结训练可以加快训练速度
    #   也可以在训练初期防止权值被破坏。
    #   Init_Epoch为起始世代
    #   Freeze_Epoch为冻结训练的世代
    #   Epoch总训练世代
    #   提示OOM或者显存不足请调小Batch_size
    #------------------------------------------------------#
    if True:
        lr = 1e-3
        Batch_size = 2
        Init_Epoch = 0
        Freeze_Epoch = 200
        
        optimizer = optim.Adam(net.parameters(),lr)
        if Cosine_lr:
            lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5, eta_min=1e-5)
        else:
            lr_scheduler = optim.lr_scheduler.StepLR(optimizer,step_size=1,gamma=0.92)

        if Use_Data_Loader:
            train_dataset = YoloDataset(lines[:num_train], (input_shape[0], input_shape[1]), mosaic=mosaic, is_train=True)
            val_dataset = YoloDataset(lines[num_train:], (input_shape[0], input_shape[1]), mosaic=False, is_train=False)

            gen = DataLoader(train_dataset, shuffle=True, batch_size=Batch_size, num_workers=4, pin_memory=True,
                                    drop_last=True, collate_fn=yolo_dataset_collate)
            gen_val = DataLoader(val_dataset, shuffle=True, batch_size=Batch_size, num_workers=4,pin_memory=True, 
                                    drop_last=True, collate_fn=yolo_dataset_collate)
        else:
            gen = Generator(Batch_size, lines[:num_train],
                            (input_shape[0], input_shape[1])).generate(train=True, mosaic = mosaic)
            gen_val = Generator(Batch_size, lines[num_train:],
                            (input_shape[0], input_shape[1])).generate(train=False, mosaic = mosaic)

        
        #------------------------------------#
        #   冻结一定部分训练
        #------------------------------------#
        for param in model.backbone.parameters():
            param.requires_grad = True

        epoch_size = max(1, num_train//Batch_size)
        epoch_size_val = num_val//Batch_size
        for epoch in range(Init_Epoch,Freeze_Epoch):
            fit_one_epoch(net,yolo_losses,epoch,epoch_size,epoch_size_val,gen,gen_val,Freeze_Epoch,Cuda)
            lr_scheduler.step()

2.predict.py

# -*- coding: utf-8 -*-  
#-------------------------------------#
#       对数据集进行训练
#-------------------------------------#
import os
import time

import numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torch.utils.data import DataLoader
from tqdm import tqdm

from yolo4 import YoloBody
from yolo_training import Generator, YOLOLoss
from dataloader import YoloDataset, yolo_dataset_collate


#---------------------------------------------------#
#   获得类和先验框
#---------------------------------------------------#
def get_classes(classes_path):
    '''loads the classes'''
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    '''loads the anchors from a file'''
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    return np.array(anchors).reshape([-1,3,2])[::-1,:,:]

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

        
def fit_one_epoch(net,yolo_losses,epoch,epoch_size,epoch_size_val,gen,genval,Epoch,cuda):
    total_loss = 0
    val_loss = 0

    net.train()
    with tqdm(total=epoch_size,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
        for iteration, batch in enumerate(gen):
            if iteration >= epoch_size:
                break
            images, targets = batch[0], batch[1]
            with torch.no_grad():
                if cuda:
                    images = Variable(torch.from_numpy(images).type(torch.FloatTensor)).cuda()
                    targets = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets]
                else:
                    images = Variable(torch.from_numpy(images).type(torch.FloatTensor))
                    targets = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets]

            #----------------------#
            #   清零梯度
            #----------------------#
            optimizer.zero_grad()
            #----------------------#
            #   前向传播
            #----------------------#
            outputs = net(images)
            losses = []
            num_pos_all = 0
            #----------------------#
            #   计算损失
            #----------------------#
            for i in range(3):
                loss_item, num_pos = yolo_losses[i](outputs[i], targets)
                losses.append(loss_item)
                num_pos_all += num_pos

            loss = sum(losses) / num_pos_all
            #----------------------#
            #   反向传播
            #----------------------#
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            pbar.set_postfix(**{'total_loss': total_loss / (iteration + 1), 'lr'        : get_lr(optimizer)})
            pbar.update(1)


    net.eval()
    print('Start Validation')
    with tqdm(total=epoch_size_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
        for iteration, batch in enumerate(genval):
            if iteration >= epoch_size_val:
                break
            images_val, targets_val = batch[0], batch[1]

            with torch.no_grad():
                if cuda:
                    images_val = Variable(torch.from_numpy(images_val).type(torch.FloatTensor)).cuda()
                    targets_val = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets_val]
                else:
                    images_val = Variable(torch.from_numpy(images_val).type(torch.FloatTensor))
                    targets_val = [Variable(torch.from_numpy(ann).type(torch.FloatTensor)) for ann in targets_val]
                optimizer.zero_grad()
                outputs = net(images_val)
                losses = []
                num_pos_all = 0
                for i in range(3):
                    loss_item, num_pos = yolo_losses[i](outputs[i], targets_val)
                    losses.append(loss_item)
                    num_pos_all += num_pos
                loss = sum(losses) / num_pos_all
                val_loss += loss.item()
            pbar.set_postfix(**{'total_loss': val_loss / (iteration + 1)})
            pbar.update(1)
    print('Finish Validation')
    print('Epoch:'+ str(epoch+1) + '/' + str(Epoch))
    print('Total Loss: %.4f || Val Loss: %.4f ' % (total_loss/(epoch_size+1),val_loss/(epoch_size_val+1)))

    if (epoch+1)%20==0:
        print('Saving state, iter:', str(epoch+1))
        # torch.save(model.state_dict(), 'data/model3/Epoch%d-Total_Loss%.4f-Val_Loss%.4f.pth'%((epoch+1),total_loss/(epoch_size+1),val_loss/(epoch_size_val+1)))
        torch.save(model, 'data/model1/Epoch%d-Total_Loss%.4f-Val_Loss%.4f.pth'%((epoch+1),total_loss/(epoch_size+1),val_loss/(epoch_size_val+1)))



if __name__ == "__main__":
    Cuda = False
    #   Dataloder的使用
    Use_Data_Loader = True
    normalize = False
    input_shape = (416,416)
    anchors_path = 'data/dataset2/coco_anchors.names'
    classes_path = 'data/dataset2/coins.names'

    #   获取classes和anchor
    class_names = get_classes(classes_path)
    anchors = get_anchors(anchors_path)
    num_classes = len(class_names)
    print("class_num", num_classes)
    
    #------------------------------------------------------#
    mosaic = False  # mosaic 马赛克数据增强, 实际测试时mosaic数据增强并不稳定，所以默认为False
    Cosine_lr = False  # Cosine_scheduler 余弦退火学习率 True or False
    smoooth_label = 0  # label_smoothing 标签平滑 0.01以下一般 如0.01、0.005

    #------------------------------------------------------#
    model_path = "data/model/yolo4_weights.pth"
    print('Loading weights into state dict...')

    model = YoloBody(len(anchors[0]), num_classes)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model_dict = model.state_dict()
    pretrained_dict = torch.load(model_path, map_location=device)
    # 因为换了upsample，权重加载时会找不到节点，因此这里需要排除upsample的权重加载
    pretrained_dict = {k: v for k, v in pretrained_dict.items() if k.find('upsample')==-1 if np.shape(model_dict[k]) ==  np.shape(v)}
    # #原始加载权重方法
    # pretrained_dict = {k: v for k, v in pretrained_dict.items() if np.shape(model_dict[k]) ==  np.shape(v)}
    model_dict.update(pretrained_dict)
    model.load_state_dict(model_dict)
    # print(model)

    net = model.train()
    if Cuda:
        net = torch.nn.DataParallel(model)
        cudnn.benchmark = True
        net = net.cuda()

    # 建立loss函数
    yolo_losses = []
    for i in range(3):
        yolo_losses.append(YOLOLoss(np.reshape(anchors,[-1,2]),num_classes, \
                                (input_shape[1], input_shape[0]), smoooth_label, Cuda, normalize))

    #-----------------dataset------------------------#
    annotation_path = 'data/dataset2/coins/train.txt'
    val_split = 0.1
    with open(annotation_path) as f:
        lines = f.readlines()
    np.random.seed(10101)
    np.random.shuffle(lines)
    np.random.seed(None)
    num_val = int(len(lines)*val_split)
    num_train = len(lines) - num_val
    

    #------------------------------------------------------#
    #   主干特征提取网络特征通用，冻结训练可以加快训练速度
    #   也可以在训练初期防止权值被破坏。
    #   Init_Epoch为起始世代
    #   Freeze_Epoch为冻结训练的世代
    #   Epoch总训练世代
    #   提示OOM或者显存不足请调小Batch_size
    #------------------------------------------------------#
    if True:
        lr = 1e-3
        Batch_size = 2
        Init_Epoch = 0
        Freeze_Epoch = 200
        
        optimizer = optim.Adam(net.parameters(),lr)
        if Cosine_lr:
            lr_scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=5, eta_min=1e-5)
        else:
            lr_scheduler = optim.lr_scheduler.StepLR(optimizer,step_size=1,gamma=0.92)

        if Use_Data_Loader:
            train_dataset = YoloDataset(lines[:num_train], (input_shape[0], input_shape[1]), mosaic=mosaic, is_train=True)
            val_dataset = YoloDataset(lines[num_train:], (input_shape[0], input_shape[1]), mosaic=False, is_train=False)

            gen = DataLoader(train_dataset, shuffle=True, batch_size=Batch_size, num_workers=4, pin_memory=True,
                                    drop_last=True, collate_fn=yolo_dataset_collate)
            gen_val = DataLoader(val_dataset, shuffle=True, batch_size=Batch_size, num_workers=4,pin_memory=True, 
                                    drop_last=True, collate_fn=yolo_dataset_collate)
        else:
            gen = Generator(Batch_size, lines[:num_train],
                            (input_shape[0], input_shape[1])).generate(train=True, mosaic = mosaic)
            gen_val = Generator(Batch_size, lines[num_train:],
                            (input_shape[0], input_shape[1])).generate(train=False, mosaic = mosaic)

        
        #------------------------------------#
        #   冻结一定部分训练
        #------------------------------------#
        for param in model.backbone.parameters():
            param.requires_grad = True

        epoch_size = max(1, num_train//Batch_size)
        epoch_size_val = num_val//Batch_size
        for epoch in range(Init_Epoch,Freeze_Epoch):
            fit_one_epoch(net,yolo_losses,epoch,epoch_size,epoch_size_val,gen,gen_val,Freeze_Epoch,Cuda)
            lr_scheduler.step()

3.post_process.py

from yolo4 import YoloBody
import torch
from PIL import Image
from torchvision import transforms
import cv2
import numpy as np
from utils import (DecodeBox, bbox_iou, letterbox_image,non_max_suppression, yolo_correct_boxes)
import os
import colorsys
from PIL import Image, ImageDraw, ImageFont


def get_class(classes_path):
    classes_path = os.path.expanduser(classes_path)
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    anchors_path = os.path.expanduser(anchors_path)
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    return np.array(anchors).reshape([-1, 3, 2])[::-1,:,:]

def get_letterbox_image(image, size):
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image


confidence=0.5
letterbox_image=False
anchors_path='data/dataset2/coco_anchors.names'
classes_path='data/dataset2/coins.names'
model_path="data/model1/test.pth"

class_names = get_class(classes_path)
# 画框设置不同的颜色
hsv_tuples = [(x / len(class_names), 1., 1.)
for x in range(len(class_names))]
colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),colors))


def result(outputs, image):
    # 模型后处理
    output_list = []
    for i in range(3):
        print(type(outputs[i]),outputs[i].shape)
        decodeBox=DecodeBox(get_anchors(anchors_path)[i], len(class_names),  (416, 416))
        output_list.append(decodeBox(outputs[i]))
        print(outputs[i].size(),decodeBox(outputs[i]).shape)


    output = torch.cat(output_list, 1)
    batch_detections = non_max_suppression(output, len(class_names), conf_thres=confidence, nms_thres=0.3)
    print(output.shape,batch_detections)
    try:
        batch_detections = batch_detections[0].cpu().numpy()
    except:
        return image

    # 检测框处理
    top_index = batch_detections[:,4] * batch_detections[:,5] > confidence
    top_conf = batch_detections[top_index,4]*batch_detections[top_index,5]
    top_label = np.array(batch_detections[top_index,-1],np.int32)
    top_bboxes = np.array(batch_detections[top_index,:4])
    top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(top_bboxes[:,0],-1),np.expand_dims(top_bboxes[:,1],-1),np.expand_dims(top_bboxes[:,2],-1),np.expand_dims(top_bboxes[:,3],-1)

    #-----------------------------------------------------------------#
    #   在图像传入网络预测前会进行letterbox_image给图像周围添加灰条
    #   因此生成的top_bboxes是相对于有灰条的图像的
    #   我们需要对其进行修改，去除灰条的部分。
    #-----------------------------------------------------------------#
    
    image_shape = np.array(np.shape(image)[0:2])
    if letterbox_image:
        boxes = yolo_correct_boxes(top_ymin,top_xmin,top_ymax,top_xmax,np.array([416,416]),image_shape)
    else:
        top_xmin = top_xmin / 416 * image_shape[1]
        top_ymin = top_ymin / 416 * image_shape[0]
        top_xmax = top_xmax / 416 * image_shape[1]
        top_ymax = top_ymax / 416 * image_shape[0]
        boxes = np.concatenate([top_ymin,top_xmin,top_ymax,top_xmax], axis=-1)
        
    # font = ImageFont.truetype(font='/usr/share/fonts/truetype/lyx/cmr10.ttf',size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))
    font = ImageFont.truetype(font='data/simhei.ttf',size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))

    thickness = max((np.shape(image)[0] + np.shape(image)[1]) // 416, 1)

    for i, c in enumerate(top_label):
        predicted_class = class_names[c]
        score = top_conf[i]

        top, left, bottom, right = boxes[i]
        top = top - 5
        left = left - 5
        bottom = bottom + 5
        right = right + 5

        top = max(0, np.floor(top + 0.5).astype('int32'))
        left = max(0, np.floor(left + 0.5).astype('int32'))
        bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
        right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))

        # 画框框
        label = '{} {:.2f}'.format(predicted_class, score)
        draw = ImageDraw.Draw(image)
        label_size = draw.textsize(label, font)
        label = label.encode('utf-8')
        print(label, top, left, bottom, right)
        
        if top - label_size[1] >= 0:
            text_origin = np.array([left, top - label_size[1]])
        else:
            text_origin = np.array([left, top + 1])

        for i in range(thickness):
            draw.rectangle(
                [left + i, top + i, right - i, bottom - i],
                outline=colors[class_names.index(predicted_class)])
        draw.rectangle(
            [tuple(text_origin), tuple(text_origin + label_size)],
            fill=colors[class_names.index(predicted_class)])
        draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
    # image.show()

def prediect(img):
    # 模型加载
    device = torch.device('cpu')
    model=torch.load(model_path)
    model=model.to(device)

    # 模型预测
    # img = torch.from_numpy(img)
    img = torch.tensor(img, dtype=torch.float32)
    torch.no_grad()
    outputs = model(img)

    return outputs

def get_imgges(image):
    # 数据处理
    image_shape = np.array(np.shape(image)[0:2])
    print(type(image))

    if letterbox_image:
        crop_img = np.array(get_letterbox_image(image, (416,416)))
    else:
        crop_img = image.convert('RGB')
        crop_img = crop_img.resize((416,416), Image.BICUBIC)
    photo = np.array(crop_img,dtype = np.float32) / 255.0
    photo = np.transpose(photo, (2, 0, 1))
    img = [photo]
    img=np.asarray(img)

    return img

if __name__ == '__main__':
    img_path="data/img/test1.jpg"
    image = Image.open(img_path)
    img=get_imgges(image)
    
    outputs=prediect(img)
    result(outputs, image)

4.test_pth.py

import os
from PIL import Image, ImageDraw, ImageFont
from post_process import *

img_path="data/img/test1.jpg"
image = Image.open(img_path)
img=get_imgges(image)

outputs=prediect(img)
result(outputs, image)

print(outputs[0].shape, outputs[1].shape, outputs[2].shape)

5.pth2onnx.py

import torch


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = torch.load("data/model1/test.pth") # pytorch模型加载
model.eval()

input_shape=list(map(int, "1,3,416,416".split(",")))
x = torch.randn(input_shape)   # 生成张量
x = x.to(device)

export_onnx_file = "data/model1/test.onnx"      # 目的ONNX文件名
#torch.onnx.export(model, x, export_onnx_file, verbose=True)
torch.onnx.export(model, x, export_onnx_file, verbose=True, export_params=True, do_constant_folding=True, opset_version=11)

6.test_onnx_v1.py

import cv2
import numpy as np
import onnxruntime as rt
import torch
from PIL import Image
from post_process import *


def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

def onnx_runtime(img):
    sess = rt.InferenceSession("data/model1/test.onnx")
    inputs = {sess.get_inputs()[0].name: img}
    output = sess.run(None, inputs)

    outputs=[]
    for i in range(3):
        outputs.append(torch.from_numpy(output[i]))

    outputs=tuple(outputs)
    return outputs

img_path="data/img/test1.jpg"
image = Image.open(img_path)
img=get_imgges(image)
print(img.shape)

outputs=onnx_runtime(img)
result(outputs, image)
print(type(outputs), outputs[0].shape, outputs[1].shape, outputs[2].shape)

7.test_onnx_v2.py

import numpy as np
import torch
import onnx
import onnxruntime as rt
import pickle

# 测试数据
x = torch.randn(1,3,416,416, requires_grad=False)

# 使用 ONNX 的 API 检查 ONNX 模型
onnx_model = onnx.load("data/model1/test.onnx")
onnx.checker.check_model(onnx_model)

# onnx模型测试
sess = rt.InferenceSession("data/model1/test.onnx")
def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

 #结果输出
ort_inputs = {sess.get_inputs()[0].name: to_numpy(x)}
ort_outs = sess.run(None, ort_inputs)
print(x.shape, ort_outs[0].shape)

# torch模型测试
model=torch.load("data/model1/test.pth",map_location='cpu')
model.eval()
torch_out = model(x)
print(x.shape, torch_out[0].shape)

# 比较ONNX 和 PyTorch 的结果
np.testing.assert_allclose(to_numpy(torch_out[0]), ort_outs[0], rtol=1e-03, atol=1e-05)
print("模型没有太大差异!")

三、修改后处理（使用yolov3的后处理）

这里改用yolov3的后处理方式，把推理结果从大中小三个输出框截断，然后对接decode层和nms层，参考：https://blog.csdn.net/gm_Ergou/article/details/118573834

1.post_process2.py

import cv2
import numpy as np
import os
import colorsys
from PIL import Image, ImageDraw, ImageFont



def get_class(classes_path):
    classes_path = os.path.expanduser(classes_path)
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names

def get_anchors(anchors_path):
    anchors_path = os.path.expanduser(anchors_path)
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    return np.array(anchors).reshape([-1, 3, 2])[::-1,:,:]

def sigmoid(x):
    x_ravel = x.ravel()  # 将numpy数组展平
    length = len(x_ravel)
    y = []
    for index in range(length):
        if x_ravel[index] >= 0:
            y.append(1.0 / (1 + np.exp(-x_ravel[index])))
        else:
            y.append(np.exp(x_ravel[index]) / (np.exp(x_ravel[index]) + 1))
    return np.array(y).reshape(x.shape)

def letterbox_image(image, size):
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

# 数据处理
def get_imgges(image, letterbox):
    if letterbox:
        crop_img = np.array(letterbox_image(image, (416,416)))
    else:
        crop_img = image.convert('RGB')
        crop_img = crop_img.resize((416,416), Image.BICUBIC)

    photo = np.array(crop_img,dtype = np.float32) / 255.0
    photo = np.transpose(photo, (2, 0, 1))
    img = [photo]
    img=np.asarray(img)

    return img


def DecodeBox2(anchors, num_classes, img_size, input):
    anchors = anchors
    num_anchors = len(anchors)
    num_classes = num_classes
    bbox_attrs = 5 + num_classes
    img_size = img_size

    batch_size = input.shape[0]
    input_height = input.shape[2]
    input_width = input.shape[3]
    # print(batch_size, input_height, input_width, input.shape)

    stride_h = img_size[1] / input_height
    stride_w = img_size[0] / input_width

    scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in anchors]

    # prediction = input.view(batch_size, num_anchors, bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
    a = input.reshape(batch_size, num_anchors, bbox_attrs, input_height, input_width).transpose(0, 1, 3, 4, 2)
    prediction = np.copy(a)
    # print(prediction, prediction.shape)

    # 先验框的中心位置的调整参数
    x = sigmoid(prediction[..., 0])  
    y = sigmoid(prediction[..., 1])
    # 先验框的宽高调整参数
    w = prediction[..., 2]
    h = prediction[..., 3]
    # 获得置信度，是否有物体
    conf = sigmoid(prediction[..., 4])
    # 种类置信度
    pred_cls = sigmoid(prediction[..., 5:])

    #   生成网格，先验框中心，网格左上角 
    grid_x = np.linspace(0, input_width - 1, input_width)
    grid_x = np.tile(np.tile(grid_x, (input_height, 1)), (batch_size * num_anchors, 1, 1))
    grid_x = grid_x.reshape(x.shape).astype(np.float16)

    grid_y = np.linspace(0, input_height - 1, input_height)
    grid_y = np.tile(np.tile(grid_y, (input_width, 1)).T, (batch_size * num_anchors, 1, 1))
    grid_y = grid_y.reshape(y.shape).astype(np.float16)
    # print(grid_y, grid_y.shape)

    # #   按照网格格式生成先验框的宽高 
    anchor_w = np.array(scaled_anchors).astype(np.float16)[:,0].reshape(len(scaled_anchors),1)  # len(scaled_anchors)=3
    anchor_h = np.array(scaled_anchors).astype(np.float16)[:,1].reshape(len(scaled_anchors),1)
    anchor_w = np.tile(np.tile(anchor_w, (batch_size, 1)), (1, 1, input_height * input_width)).reshape(w.shape)
    anchor_h = np.tile(np.tile(anchor_h, (batch_size, 1)), (1, 1, input_height * input_width)).reshape(h.shape)
    # print(anchor_w,anchor_h)
    # print(anchor_w.shape, anchor_h.shape)

    #----------------------------------------------------------#
    #   利用预测结果对先验框进行调整
    #   首先调整先验框的中心，从先验框中心向右下角偏移
    #   再调整先验框的宽高。
    #----------------------------------------------------------#
    pred_boxes = np.zeros(shape=prediction[..., :4].shape)
    pred_boxes[..., 0] = x.data + grid_x
    pred_boxes[..., 1] = y.data + grid_y
    pred_boxes[..., 2] = np.exp(w.data) * anchor_w
    pred_boxes[..., 3] = np.exp(h.data) * anchor_h
    # print(pred_boxes)

    #----------------------------------------------------------#
    #   将输出结果调整成相对于输入图像大小
    #----------------------------------------------------------#
    _scale=np.array([stride_w, stride_h] * 2).astype(np.float16)
    output = np.concatenate((pred_boxes.reshape(batch_size, -1, 4) * _scale,
        conf.reshape(batch_size, -1, 1), pred_cls.reshape(batch_size, -1, num_classes)), -1)
    
    return output      

def yolo_correct_boxes(top, left, bottom, right, input_shape, image_shape):
    new_shape = image_shape*np.min(input_shape/image_shape)

    offset = (input_shape-new_shape)/2./input_shape
    scale = input_shape/new_shape

    box_yx = np.concatenate(((top+bottom)/2,(left+right)/2),axis=-1)/input_shape
    box_hw = np.concatenate((bottom-top,right-left),axis=-1)/input_shape

    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes =  np.concatenate([
        box_mins[:, 0:1],
        box_mins[:, 1:2],
        box_maxes[:, 0:1],
        box_maxes[:, 1:2]
    ],axis=-1)
    boxes *= np.concatenate([image_shape, image_shape],axis=-1)
    return boxes


def bbox_iou2(box1, box2, x1y1x2y2=True):
    """
        计算IOU
    """
    if not x1y1x2y2:
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    inter_rect_x1 = np.maximum(b1_x1, b2_x1)
    inter_rect_y1 = np.maximum(b1_y1, b2_y1)
    inter_rect_x2 = np.minimum(b1_x2, b2_x2)
    inter_rect_y2 = np.minimum(b1_y2, b2_y2)

    data1=inter_rect_x2 - inter_rect_x1 + 1
    data2=inter_rect_y2 - inter_rect_y1 + 1
    inter_area = np.clip(data1, a_min=0, a_max=max(data1)) * np.clip(data2, a_min=0, a_max=max(data2))
    
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

    return iou
     
def non_max_suppression2(prediction, num_classes, conf_thres=0.5, nms_thres=0.3):
    box_corner = np.zeros(shape=prediction.shape)
    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
    prediction[:, :, :4] = box_corner[:, :, :4]

    output = [None for _ in range(len(prediction))]
    for image_i, image_pred in enumerate(prediction):
        data=image_pred[:, 5:5 + num_classes]
        class_conf=np.max(data, axis=1).reshape(len(data),1)
        class_pred=data.argmax(axis=1).reshape(len(data),1)

        #----------------------------------------------------------#
        #   利用置信度进行第一轮筛选
        #----------------------------------------------------------#
        conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()

        #----------------------------------------------------------#
        #   根据置信度进行预测结果的筛选
        #----------------------------------------------------------#
        image_pred = image_pred[conf_mask]
        class_conf = class_conf[conf_mask]
        class_pred = class_pred[conf_mask]

        if len(image_pred)<=0:
            continue

        # detections  [num_anchors, 7]   7的内容为：x1, y1, x2, y2, obj_conf, class_conf, class_pred
        detections = np.concatenate((image_pred[:, :5], class_conf.astype(np.float16), class_pred.astype(np.float16)), 1)

        # 获得预测结果中包含的所有种类
        unique_labels = np.unique(detections[:, -1])

        for c in unique_labels:
            detections_class = detections[detections[:, -1] == c]
            
            # # 按照存在物体的置信度排序
            conf_sort_index = np.argsort(-(detections_class[:, 4]*detections_class[:, 5]), axis=0)
            detections_class = detections_class[conf_sort_index]

            # 进行非极大抑制
            max_detections = []
            while detections_class.shape[0]>0:
                # 取出这一类置信度最高的，一步一步往下判断，判断重合程度是否大于nms_thres，如果是则去除掉
                max_detections.append(np.expand_dims(detections_class[0],axis=0))
                if len(detections_class) == 1:
                    break
                
                ious = bbox_iou2(max_detections[-1], detections_class[1:])
                detections_class = detections_class[1:][ious < nms_thres]
                
            # 堆叠
            max_detections = np.concatenate(max_detections)

            # Add max detections to outputs
            output[image_i] = max_detections if output[image_i] is None else np.concatenate((output[image_i], max_detections))

    return output

def prediect(img):
    # 模型加载
    device = torch.device('cpu')
    model=torch.load(model_path)
    model=model.to(device)

    # 模型预测
    # img = torch.from_numpy(img)
    img = torch.tensor(img, dtype=torch.float32)
    torch.no_grad()
    outputs = model(img)

    return outputs

def Regression(batch_detections, confidence, image, letterbox):
    # 检测框处理
    top_index = batch_detections[:,4] * batch_detections[:,5] > confidence
    top_conf = batch_detections[top_index,4]*batch_detections[top_index,5]
    top_label = np.array(batch_detections[top_index,-1],np.int32)
    top_bboxes = np.array(batch_detections[top_index,:4])
    top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(top_bboxes[:,0],-1),np.expand_dims(top_bboxes[:,1],-1),np.expand_dims(top_bboxes[:,2],-1),np.expand_dims(top_bboxes[:,3],-1)

    #-----------------------------------------------------------------#
    image_shape = np.array(np.shape(image)[0:2])
    if letterbox:
        boxes = yolo_correct_boxes(top_ymin,top_xmin,top_ymax,top_xmax,np.array([416,416]),image_shape)
    else:
        top_xmin = top_xmin / 416 * image_shape[1]
        top_ymin = top_ymin / 416 * image_shape[0]
        top_xmax = top_xmax / 416 * image_shape[1]
        top_ymax = top_ymax / 416 * image_shape[0]
        boxes = np.concatenate([top_ymin,top_xmin,top_ymax,top_xmax], axis=-1)

    return boxes.astype(np.int), top_conf, top_label

def draw_box(boxes, top_conf, top_label, class_names, image):
    font = ImageFont.truetype(font='model_data/simhei.ttf',size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))
    thickness = max((np.shape(image)[0] + np.shape(image)[1]) // 416, 1)

    # 画框设置不同的颜色
    hsv_tuples = [(x / len(class_names), 1., 1.)
    for x in range(len(class_names))]
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),colors))

    for i, c in enumerate(top_label):
        predicted_class = class_names[c]
        score = top_conf[i]

        top, left, bottom, right = boxes[i]
        top = top - 5
        left = left - 5
        bottom = bottom + 5
        right = right + 5

        top = max(0, np.floor(top + 0.5).astype('int32'))
        left = max(0, np.floor(left + 0.5).astype('int32'))
        bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
        right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))

        # 画框框
        label = '{} {:.2f}'.format(predicted_class, score)
        draw = ImageDraw.Draw(image)
        label_size = draw.textsize(label, font)
        label = label.encode('utf-8')
        print(label, top, left, bottom, right)
        
        if top - label_size[1] >= 0:
            text_origin = np.array([left, top - label_size[1]])
        else:
            text_origin = np.array([left, top + 1])

        for i in range(thickness):
            draw.rectangle(
                [left + i, top + i, right - i, bottom - i],
                outline=colors[class_names.index(predicted_class)])
        draw.rectangle(
            [tuple(text_origin), tuple(text_origin + label_size)],
            fill=colors[class_names.index(predicted_class)])
        draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
    # image.show()

2.test_om.py

import sys
import onnx
import os
import argparse
import numpy as np
import cv2
import onnxruntime
import torch

import colorsys
from PIL import Image, ImageDraw, ImageFont
import post_process2 as post_process


def letterbox_image2(image, size, letterbox):
    # INTER_NEAREST:最邻近插值,INTER_LINEAR:双线性插值,INTER_CUBIC:4x4像素邻域内的双立方插值,INTER_LANCZOS4:8x8像素邻域内的Lanczos插值
    if letterbox:
        ih, iw = image.shape[0:2]
        w, h = size
        scale = min(w/iw, h/ih)
        nw = int(iw*scale)
        nh = int(ih*scale)

        image = cv2.resize(image, (nw,nh), interpolation=cv2.INTER_LINEAR)
        img = np.ones((w, h,3),dtype=np.uint8)
        img[:,:]=128
        img[(h-nh)//2:(h-nh)//2+nh, (w-nw)//2:(w-nw)//2+nw]=image
    else:
        img = cv2.resize(image, size, interpolation=cv2.INTER_LINEAR) 

    # cv2.imshow('img',img)
    # cv2.waitKey(0) 
    return img

def letterbox_image(image, size):
    iw, ih = image.size
    w, h = size
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw,nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128,128,128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    # new_image.show()

    return new_image


if __name__ == '__main__':
    # 参数
    conf_thres=0.5
    nms_thres=0.3
    anchors_path='data/dataset2/coco_anchors.names'
    classes_path='data/dataset2/coins.names'

    image_path="data/img/test1.jpg"
    weight_file='data/model1/test.pth'
    onnx_file_name = 'data/model1/test.onnx'
    

    # 备注：img1是工程预处理，img2是自己写的，img3是atlas的om模型输入数据
    # letterbox=True时，img1=img2!=img3，letterbox=False时，img2=img3!=img1 （由于cv和PIL的resize不一样，有小误差）
    # img1:原代码预处理
    letterbox=True
    image_src = cv2.imread(image_path)
    img1 = cv2.cvtColor(image_src, cv2.COLOR_BGR2RGB)
    img1 = letterbox_image2(img1, (416,416), letterbox)
    img1 = np.transpose(img1, (2, 0, 1)).astype(np.float32) / 255.0
    img1 = np.expand_dims(img1, axis=0)
    print(img1.shape)

    # img2:自己写的预处理，参考yolov3的
    image_src2 = Image.open(image_path)
    if letterbox:
        crop_img = np.array(letterbox_image(image_src2, (416,416)))
    else:
        crop_img = image_src2.convert('RGB')
        crop_img = crop_img.resize((416,416), Image.BILINEAR)  #NEAREST:最低质量，BILINEAR:双线性，BICUBIC:三次样条插值，ANTIALIAS:最高质量
    
    photo = np.array(crop_img,dtype = np.float32) / 255.0
    photo = np.transpose(photo, (2, 0, 1))
    img2 = np.expand_dims(photo, axis=0)
    print(img2.shape)
    
    # img3: om模型的数据输入，atc转换时截断到input层得到的数据
    img3=np.load("data/data2/input.npy")
    print(img3.shape)


    # Compute
    session = onnxruntime.InferenceSession(onnx_file_name)
    input_name = session.get_inputs()[0].name
    outputs = session.run(None, {input_name: img3})
    # print(len(outputs))
    conv_sbbox=outputs[0]
    conv_mbbox=outputs[1]
    conv_lbbox=outputs[2]

    input_size=(416, 416)
    class_names = post_process.get_class(classes_path)
    decode_sbbox=post_process.DecodeBox2(post_process.get_anchors(anchors_path)[0], len(class_names),  input_size, conv_sbbox)
    decode_mbbox=post_process.DecodeBox2(post_process.get_anchors(anchors_path)[1], len(class_names),  input_size, conv_mbbox)
    decode_lbbox=post_process.DecodeBox2(post_process.get_anchors(anchors_path)[2], len(class_names),  input_size, conv_lbbox)
    output = np.concatenate([decode_sbbox, decode_mbbox, decode_lbbox], 1)
    print(decode_sbbox.shape, decode_mbbox.shape, decode_lbbox.shape, output.shape)

    batch_detections = post_process.non_max_suppression2(output, len(class_names), conf_thres=conf_thres, nms_thres=nms_thres)
    
    try:
        batch_detections = np.array(batch_detections[0])
        bbox_nums=np.array(batch_detections[0]).shape[0]
    except:
        print("没有检测结果！")
        exit()
        
    image = Image.open(image_path)
    boxes, top_conf, top_label=post_process.Regression(batch_detections, conf_thres, image, letterbox)
    post_process.draw_box(boxes, top_conf, top_label, class_names, image)

3.test_om2.py

import cv2
import numpy as np
import os
import colorsys
from PIL import Image, ImageDraw, ImageFont
import post_process2 as post_process


conf_thres=0.5
nms_thres=0.3
letterbox=False
anchors_path='data/dataset2/coco_anchors.names'
classes_path='data/dataset2/coins.names'


if __name__ == '__main__':
    img_path="data/img/test1.jpg"
    image = Image.open(img_path)
    img=post_process.get_imgges(image, letterbox)

    # model_path="data/model4/test.pth"
    # outputs=prediect(img)
    # conv_sbbox=outputs[0].detach().numpy()
    # conv_mbbox=outputs[1].detach().numpy()
    # conv_lbbox=outputs[2].detach().numpy()
    # np.save("data/test/conv_sbbox.npy", conv_sbbox)
    # np.save("data/test/conv_mbbox.npy", conv_mbbox)
    # np.save("data/test/conv_lbbox.npy", conv_lbbox)

    conv_sbbox=np.load("data/data2/conv_sbbox.npy")
    conv_mbbox=np.load("data/data2/conv_mbbox.npy")
    conv_lbbox=np.load("data/data2/conv_lbbox.npy")
    print(conv_sbbox.shape, conv_mbbox.shape, conv_lbbox.shape)

    
    input_size=(416, 416)
    class_names = post_process.get_class(classes_path)
    decode_sbbox=post_process.DecodeBox2(post_process.get_anchors(anchors_path)[0], len(class_names),  input_size, conv_sbbox)
    decode_mbbox=post_process.DecodeBox2(post_process.get_anchors(anchors_path)[1], len(class_names),  input_size, conv_mbbox)
    decode_lbbox=post_process.DecodeBox2(post_process.get_anchors(anchors_path)[2], len(class_names),  input_size, conv_lbbox)
    output = np.concatenate([decode_sbbox, decode_mbbox, decode_lbbox], 1)
    print(decode_sbbox.shape, decode_mbbox.shape, decode_lbbox.shape, output.shape)

    batch_detections = post_process.non_max_suppression2(output, len(class_names), conf_thres=conf_thres, nms_thres=nms_thres)
    print(batch_detections)
    try:
        batch_detections = np.array(batch_detections[0])
        bbox_nums=np.array(batch_detections[0]).shape[0]
    except:
        print("没有检测结果！")
        exit()

    boxes, top_conf, top_label=post_process.Regression(batch_detections, conf_thres, image, letterbox)
    post_process.draw_box(boxes, top_conf, top_label, class_names, image)

尼古拉斯·two_dog

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
6
评论
pytorch实现yolov4_v1（数据处理+训练测试+转模型）

参考链接：https://blog.csdn.net/qq_44876051/article/details/107665310?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_baidulandingword-2&spm=1001.2101.3001.4242https://www.cnblogs.com/wujianming-110117/p/13845974.htmlpytorch代码实现：https://githu
复制链接

扫一扫

专栏目录