Windows配置DETR算法模型，实现目标检测

最新推荐文章于 2024-08-01 21:07:10 发布

霖䨎

最新推荐文章于 2024-08-01 21:07:10 发布

阅读量772

点赞数 2

分类专栏：目标检测文章标签：目标检测人工智能计算机视觉 python

本文链接：https://blog.csdn.net/weixin_60331031/article/details/133979726

版权

目标检测专栏收录该内容

1 篇文章 0 订阅

订阅专栏

DEtection TRansformers（DETR）
DEtection TRansformer（DETR）是Facebook AI的研究者提出的Transformer的视觉版本，用于目标检测和全景分割。这是第一个将Transformer成功整合为检测pipeline中心构建块的目标检测框架。（原文地址）

本文总结了网上多位博主的复现工作，并就本人复现经历加以说明，是较为详尽的可完全按照此过程操作完成程序复现。

1、代码地址Github：https://github.com/facebookresearch/detr
2、论文地址：End-to-End Object Detection with Transformers

第一步，先将代码下载下来，然后在pycharm中打开，运行terminal，输入

pip install -r requirements.txt

在安装第二个库cocoapi时出现了安装失败。

解决方法：–>本地安装加换其它大神编译好的包，下载地址：https://github.com/philferriere/cocoapi

打开anconda的命令界面输入activate your_env_name (激活anaconda虚拟环境，在pycharm的端口输入)；
进入刚才下载的源码setup.py所在目录 cocoapi-master\PythonAPI；
运行python setup.py build_ext install即可安装完成。
解决了这个问题，安装其他的库就没得什么问题了，如果遇到vc++2015之类的问题建议在vs studio中把c++那一项也安装好。

第二步，将源程序的pth文件改一下，因为他是用的coco数据集，而我们只需要训练自己的数据集，就是下图这个文件
在这里插入图片描述（该文件通过github上下载的包中README.md文件有网址直接下载）

之后运行change.py（内容如下程序），改变当前pth文件的class，修改num_classes为自己的类别数+1，以适应自己训练的数据集。

import torch
pretrained_weights  = torch.load('detr-r50-e632da11.pth')

num_class = 3 #这里是你的物体数+1，因为背景也算一个
pretrained_weights["model"]["class_embed.weight"].resize_(num_class+1, 256)
pretrained_weights["model"]["class_embed.bias"].resize_(num_class+1)
torch.save(pretrained_weights, "detr-r50_%d.pth"%num_class

运行完后会生成下图的文件：
在这里插入图片描述，将该文件保存到当前工程路径下。

第三步，准备自己的训练集，首先你要用自己的数据标记一个VOC类型的数据集，这里就不多赘述了，可以自己百度下载labelimg来打标，然后将文件放入这几个文件夹，就可以了。

DETR需要的数据集格式为coco格式，图片和标签文件保存于训练集、测试集、验证集、标签文件四个文件夹中，其中annotations中存放json格式的标签文件

（数据集按此文件结构放置，命名也不要变）

第四步，转换成json格式，生成的文件夹记得改为instances_train2017,json这种样子，如下图

下面的代码包含了几种数据集RSOD、NWPU、DIOR、YOLO数据集标签文件转换json功能。新建py文件tojson.py，使用如下代码生成需要的json文件。

生成instances_train2017.json
（a）修改29行image_path默认路径为train2017的路径；
（b）修改31行annotation_path默认路径为标签文件路径（train和val的标签都放在这个文件夹下，所以生成instances_val2017.json时就不需要再修改这个路径了）；
（c）修改33行dataset为自己的数据集名称NWPU
（d）.修改34行save的默认路径为json文件的保存路径…/NWPUVHR-10/annotations/instances_train2017.json

import os
import cv2
import json
import argparse
from tqdm import tqdm
import xml.etree.ElementTree as ET

COCO_DICT=['images','annotations','categories']
IMAGES_DICT=['file_name','height','width','id']

ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']

CATEGORIES_DICT=['id','name']
## {'supercategory': 'person', 'id': 1, 'name': 'person'}
## {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}
YOLO_CATEGORIES=['person']
RSOD_CATEGORIES=['aircraft','playground','overpass','oiltank']
NWPU_CATEGORIES=['airplane','ship','storage tank','baseball diamond','tennis court',\
					'basketball court','ground track field','harbor','bridge','vehicle']

VOC_CATEGORIES=['aeroplane','bicycle','bird','boat','bottle','bus','car','cat','chair','cow',\					'diningtable','dog','horse','motorbike','person','pottedplant','sheep','sofa','train','tvmonitor']

DIOR_CATEGORIES=['golffield','Expressway-toll-station','vehicle','trainstation','chimney','storagetank',\
					'ship','harbor','airplane','groundtrackfield','tenniscourt','dam','basketballcourt',\
					'Expressway-Service-area','stadium','airport','baseballfield','bridge','windmill','overpass']

parser=argparse.ArgumentParser(description='2COCO')
#parser.add_argument('--image_path',type=str,default=r'T:/shujuji/DIOR/JPEGImages-trainval/',help='config file')
parser.add_argument('--image_path',type=str,default=r'G:/NWPU VHR-10 dataset/positive image set/',help='config file')
#parser.add_argument('--annotation_path',type=str,default=r'T:/shujuji/DIOR/Annotations/',help='config file')
parser.add_argument('--annotation_path',type=str,default=r'G:/NWPU VHR-10 dataset/ground truth/',help='config file')
parser.add_argument('--dataset',type=str,default='NWPU',help='config file')
parser.add_argument('--save',type=str,default='G:/NWPU VHR-10 dataset/instances_train2017.json',help='config file')
args=parser.parse_args()
def load_json(path):
	with open(path,'r') as f:
		json_dict=json.load(f)
		for i in json_dict:
			print(i)
		print(json_dict['annotations'])
def save_json(dict,path):
	print('SAVE_JSON...')
	with open(path,'w') as f:
		json.dump(dict,f)
	print('SUCCESSFUL_SAVE_JSON:',path)
def load_image(path):
	img=cv2.imread(path)
	return img.shape[0],img.shape[1]
def generate_categories_dict(category):       #ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
	print('GENERATE_CATEGORIES_DICT...')
	return [{CATEGORIES_DICT[0]:category.index(x)+1,CATEGORIES_DICT[1]:x} for x in category]  #CATEGORIES_DICT=['id','name']
def generate_images_dict(imagelist,image_path,start_image_id=11725):  #IMAGES_DICT=['file_name','height','width','id']
	print('GENERATE_IMAGES_DICT...')
	images_dict=[]
	with tqdm(total=len(imagelist)) as load_bar:
		for x in imagelist:  #x就是图片的名称
			#print(start_image_id)
			dict={IMAGES_DICT[0]:x,IMAGES_DICT[1]:load_image(image_path+x)[0],\
					IMAGES_DICT[2]:load_image(image_path+x)[1],IMAGES_DICT[3]:imagelist.index(x)+start_image_id}
			load_bar.update(1)
			images_dict.append(dict)
	return images_dict

def DIOR_Dataset(image_path,annotation_path,start_image_id=11725,start_id=0):
	categories_dict=generate_categories_dict(DIOR_CATEGORIES)    #CATEGORIES_DICT=['id'：，1'name'：golffield......]  id从1开始
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path,start_image_id)  #IMAGES_DICT=['file_name','height','width','id']  id从0开始的
	print('GENERATE_ANNOTATIONS_DICT...')  #生成cooc的注记   ANNOTATIONS_DICT=['image_id','iscrowd','area','bbox','category_id','id']
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		print(image_id)
		image_name=i['file_name']
		annotation_xml=annotation_path+image_name.split('.')[0]+'.xml'
		tree=ET.parse(annotation_xml)
		root=tree.getroot()
		for j in root.findall('object'):
			category=j.find('name').text
			category_id=DIOR_CATEGORIES.index(category)  #字典的索引，是从1开始的
			x_min=float(j.find('bndbox').find('xmin').text)
			y_min=float(j.find('bndbox').find('ymin').text)
			w=float(j.find('bndbox').find('xmax').text)-x_min
			h=float(j.find('bndbox').find('ymax').text)-y_min
			area = w * h
			bbox = [x_min, y_min, w, h]
			dict = {'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
					'id': id}
			annotations_dict.append(dict)
			id=id+1
	print('SUCCESSFUL_GENERATE_DIOR_JSON')
	return {COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def NWPU_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
	categories_dict=generate_categories_dict(NWPU_CATEGORIES)
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path,start_image_id)
	print('GENERATE_ANNOTATIONS_DICT...')
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		image_name=i['file_name']
		annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
		txt=open(annotation_txt,'r')
		lines=txt.readlines()
		for j in lines:
			if j=='\n':
				continue
			category_id=int(j.split(',')[4])
			category=NWPU_CATEGORIES[category_id-1]
			print(category_id,'        ',category)
			x_min=float(j.split(',')[0].split('(')[1])
			y_min=float(j.split(',')[1].split(')')[0])
			w=float(j.split(',')[2].split('(')[1])-x_min
			h=float(j.split(',')[3].split(')')[0])-y_min
			area=w*h
			bbox=[x_min,y_min,w,h]
			dict = {'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
					'id': id}
			id=id+1
			annotations_dict.append(dict)
	print('SUCCESSFUL_GENERATE_NWPU_JSON')
	return {COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}

def YOLO_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
	categories_dict=generate_categories_dict(YOLO_CATEGORIES)
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path)
	print('GENERATE_ANNOTATIONS_DICT...')
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		image_name=i['file_name']
		W,H=i['width'],i['height']
		annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
		txt=open(annotation_txt,'r')
		lines=txt.readlines()
		for j in lines:
			category_id=int(j.split(' ')[0])+1
			category=YOLO_CATEGORIES
			x=float(j.split(' ')[1])
			y=float(j.split(' ')[2])
			w=float(j.split(' ')[3])
			h=float(j.split(' ')[4])
			x_min=(x-w/2)*W
			y_min=(y-h/2)*H
			w=w*W
			h=h*H
			area=w*h
			bbox=[x_min,y_min,w,h]
			dict={'image_id':image_id,'iscrowd':0,'area':area,'bbox':bbox,'category_id':category_id,'id':id}
			annotations_dict.append(dict)
			id=id+1
	print('SUCCESSFUL_GENERATE_YOLO_JSON')
	return {COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
def RSOD_Dataset(image_path,annotation_path,start_image_id=0,start_id=0):
	categories_dict=generate_categories_dict(RSOD_CATEGORIES)
	imgname=os.listdir(image_path)
	images_dict=generate_images_dict(imgname,image_path,start_image_id)
	print('GENERATE_ANNOTATIONS_DICT...')
	annotations_dict=[]
	id=start_id
	for i in images_dict:
		image_id=i['id']
		image_name=i['file_name']
		annotation_txt=annotation_path+image_name.split('.')[0]+'.txt'
		txt=open(annotation_txt,'r')
		lines=txt.readlines()
		for j in lines:
			category=j.split('\t')[1]
			category_id=RSOD_CATEGORIES.index(category)+1
			x_min=float(j.split('\t')[2])
			y_min=float(j.split('\t')[3])
			w=float(j.split('\t')[4])-x_min
			h=float(j.split('\t')[5])-y_min
			area = w * h
			bbox = [x_min, y_min, w, h]
			dict = {'image_id': image_id, 'iscrowd': 0, 'area': area, 'bbox': bbox, 'category_id': category_id,
					'id': id}
			annotations_dict.append(dict)
			id=id+1
	print('SUCCESSFUL_GENERATE_RSOD_JSON')

	return {COCO_DICT[0]:images_dict,COCO_DICT[1]:annotations_dict,COCO_DICT[2]:categories_dict}
if __name__=='__main__':
	dataset=args.dataset   #数据集名字
	save=args.save  #json的保存路径
	image_path=args.image_path     #对于coco是图片的路径
	annotation_path=args.annotation_path   #coco的annotation路径
	if dataset=='RSOD':
		json_dict=RSOD_Dataset(image_path,annotation_path,0)
	if dataset=='NWPU':
		json_dict=NWPU_Dataset(image_path,annotation_path,0)
	if dataset=='DIOR':
		json_dict=DIOR_Dataset(image_path,annotation_path,11725)
	if dataset=='YOLO':
		json_dict=YOLO_Dataset(image_path,annotation_path,0)
	save_json(json_dict,save)

数据集到此就全部准备好了。

第五步，修改detr.py中305行的num_classes改成你的物体种类的数目
在这里插入图片描述

第六步，运行main.py进行训练

修改main.py文件的epochs、lr、batch_size等训练参数。两种方法：修改默认参数；用命令行运行。

1、修改默认参数，在main.py中，修改下面参数。

2、命令行执行

python main.py --dataset_file "coco" --coco_path data/coco --epochs 100 --lr=1e-4 --batch_size=2 --num_workers=4 --output_dir="outputs" --resume="detr-r50_3.pth"

训练完后会在outputs生成下图的文件，log文件是记录每一个epoch的一些信息。

到这里就完成了整个训练。

其实训练过程中，DETR在每个epoch都会自动进行一次精度评估，评估结果查看直接输出或生成的中间文件就行，也可以再用main.py评估一下精度，将下列代码中的–resume改为最后一个epoch输出的模型路径，–coco_path改为自己的数据集路径即可。

python main.py --batch_size 6 --no_aux_loss --eval --resume /home/detr-main/outputs/checkpoint0299.pth --coco_path /home/NWPUVHR-10

最后，就是拿自己的训练的模型进行测试，更改这里的图片路径为要测试的图片的路径，还有第19行的CLASSES=[]，记得改成自己的类别！

将要预测的图片保存在一个文件夹下，预测时一次输出所有图片的预测结果

需要修改的参数有：

backbone，我当时下载的backbone是resnet50，修改（训练时已经下载好了主干特征网络是Resnet50的DETR权重文件，放在主文件夹下）

数据集有关参数
–coco_path 修改为自己的数据集路径
–outputdir 修改为建立的预测图片的保存文件夹
–resume 修改为训练好的模型文件路径

修改待预测的图片文件夹路径image_file_path和image_path

放一张检测后的图像。

预测程序：

import argparse
import random
import time
from pathlib import Path
import numpy as np
import torch
from models import build_model
from PIL import Image
import os
import torchvision
from torchvision.ops.boxes import batched_nms
import cv2

#-------------------------------------------------------------------------设置参数
def get_args_parser():
    parser = argparse.ArgumentParser('Set transformer detector', add_help=False)
    parser.add_argument('--lr', default=1e-4, type=float)
    parser.add_argument('--lr_backbone', default=1e-5, type=float)
    parser.add_argument('--batch_size', default=2, type=int)
    parser.add_argument('--weight_decay', default=1e-4, type=float)
    parser.add_argument('--epochs', default=300, type=int)
    parser.add_argument('--lr_drop', default=200, type=int)
    parser.add_argument('--clip_max_norm', default=0.1, type=float,
                        help='gradient clipping max norm')

    # Model parameters
    parser.add_argument('--frozen_weights', type=str, default=None,
                        help="Path to the pretrained model. If set, only the mask head will be trained")
    # * Backbone
    parser.add_argument('--backbone', default='resnet101', type=str,
                        help="Name of the convolutional backbone to use")
    parser.add_argument('--dilation', action='store_true',
                        help="If true, we replace stride with dilation in the last convolutional block (DC5)")
    parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'),
                        help="Type of positional embedding to use on top of the image features")

    # * Transformer
    parser.add_argument('--enc_layers', default=6, type=int,
                        help="Number of encoding layers in the transformer")
    parser.add_argument('--dec_layers', default=6, type=int,
                        help="Number of decoding layers in the transformer")
    parser.add_argument('--dim_feedforward', default=2048, type=int,
                        help="Intermediate size of the feedforward layers in the transformer blocks")
    parser.add_argument('--hidden_dim', default=256, type=int,
                        help="Size of the embeddings (dimension of the transformer)")
    parser.add_argument('--dropout', default=0.1, type=float,
                        help="Dropout applied in the transformer")
    parser.add_argument('--nheads', default=8, type=int,
                        help="Number of attention heads inside the transformer's attentions")
    parser.add_argument('--num_queries', default=100, type=int,
                        help="Number of query slots")
    parser.add_argument('--pre_norm', action='store_true')

    # * Segmentation
    parser.add_argument('--masks', action='store_true',
                        help="Train segmentation head if the flag is provided")

    # Loss
    parser.add_argument('--no_aux_loss', dest='aux_loss', default='False',
                        help="Disables auxiliary decoding losses (loss at each layer)")
    # * Matcher
    parser.add_argument('--set_cost_class', default=1, type=float,
                        help="Class coefficient in the matching cost")
    parser.add_argument('--set_cost_bbox', default=5, type=float,
                        help="L1 box coefficient in the matching cost")
    parser.add_argument('--set_cost_giou', default=2, type=float,
                        help="giou box coefficient in the matching cost")
    # * Loss coefficients
    parser.add_argument('--mask_loss_coef', default=1, type=float)
    parser.add_argument('--dice_loss_coef', default=1, type=float)
    parser.add_argument('--bbox_loss_coef', default=5, type=float)
    parser.add_argument('--giou_loss_coef', default=2, type=float)
    parser.add_argument('--eos_coef', default=0.1, type=float,
                        help="Relative classification weight of the no-object class")

    # dataset parameters
    parser.add_argument('--dataset_file', default='coco')
    parser.add_argument('--coco_path', type=str,default="coco")
    parser.add_argument('--coco_panoptic_path', type=str)
    parser.add_argument('--remove_difficult', action='store_true')

    parser.add_argument('--output_dir', default='inference_demo/inference_output',
                        help='path where to save, empty for no saving')
    parser.add_argument('--device', default='cuda',
                        help='device to use for training / testing')
    parser.add_argument('--seed', default=42, type=int)
    parser.add_argument('--resume', default='inference_demo/weights/detr-r101-2c7b67e5.pth', help='resume from checkpoint')
    parser.add_argument('--start_epoch', default=0, type=int, metavar='N',
                        help='start epoch')
    parser.add_argument('--eval', default="True")
    parser.add_argument('--num_workers', default=2, type=int)

    # distributed training parameters
    parser.add_argument('--world_size', default=1, type=int,
                        help='number of distributed processes')
    parser.add_argument('--dist_url', default='env://', help='url used to set up distributed training')
    return parser


def box_cxcywh_to_xyxy(x):
    #将DETR的检测框坐标(x_center,y_cengter,w,h)转化成coco数据集的检测框坐标(x0,y0,x1,y1)
    x_c, y_c, w, h = x.unbind(1)
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         (x_c + 0.5 * w), (y_c + 0.5 * h)]
    return torch.stack(b, dim=1)

def rescale_bboxes(out_bbox, size):
    #把比例坐标乘以图像的宽和高，变成真实坐标
    img_w, img_h = size
    b = box_cxcywh_to_xyxy(out_bbox)
    b = b * torch.tensor([img_w, img_h, img_w, img_h], dtype=torch.float32)
    return b

def filter_boxes(scores, boxes, confidence=0.7, apply_nms=True, iou=0.5):
    #筛选出真正的置信度高的框
    keep = scores.max(-1).values > confidence
    scores, boxes = scores[keep], boxes[keep]

    if apply_nms:
        top_scores, labels = scores.max(-1)
        keep = batched_nms(boxes, top_scores, labels, iou)
        scores, boxes = scores[keep], boxes[keep]

    return scores, boxes

# COCO classes
CLASSES = [
    'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
    'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
    'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack',
    'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
    'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
    'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
    'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A',
    'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
    'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier',
    'toothbrush'
]

def plot_one_box(x, img, color=None, label=None, line_thickness=1):
    #把检测框画到图片上
    tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)


def main(args):
    print(args)
    device = torch.device(args.device)

    #------------------------------------导入网络
    #下面的criterion是算损失函数要用的，推理用不到,postprocessors是解码用的，这里也没有用，用的是自己的。
    model, criterion, postprocessors = build_model(args) 

    #------------------------------------加载权重
    checkpoint = torch.load(args.resume, map_location='cuda')
    model.load_state_dict(checkpoint['model'])

    #------------------------------------把权重加载到gpu或cpu上
    model.to(device)

    #------------------------------------打印出网络的参数大小
    n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print("parameters:",n_parameters)
    
    #------------------------------------设置好存储输出结果的文件夹
    output_dir = Path(args.output_dir)

    #-----------------------------------读取数据集,进行推理
    image_Totensor=torchvision.transforms.ToTensor()
    image_file_path = os.listdir("inference_demo/detect_demo")
    image_set = []

    for image_item in image_file_path:
        print("inference_image:",image_item)
        image_path = os.path.join("inference_demo/detect_demo",image_item)
        image = Image.open(image_path)
        image_tensor = image_Totensor(image)
        image_tensor = torch.reshape(image_tensor,[-1,image_tensor.shape[0],image_tensor.shape[1],image_tensor.shape[2]])
        image_tensor=image_tensor.to(device)
        time1 = time.time()
        inference_result = model(image_tensor)
        time2 = time.time()
        print("inference_time:",time2-time1)
        probas = inference_result['pred_logits'].softmax(-1)[0, :, :-1].cpu()
        bboxes_scaled = rescale_bboxes(inference_result['pred_boxes'][0, ].cpu(),(image_tensor.shape[3],image_tensor.shape[2]))
        scores, boxes = filter_boxes(probas,bboxes_scaled)
        scores = scores.data.numpy()
        boxes = boxes.data.numpy()
        for i in range(boxes.shape[0]):
            class_id = scores[i].argmax()
            label = CLASSES[class_id]
            confidence = scores[i].max()
            text = f"{label} {confidence:.3f}"
            image = np.array(image)
            plot_one_box(boxes[i],image,label=text)
        cv2.imshow("images",image)
        cv2.waitKey(1)
        image=Image.fromarray(image)
        image.save(os.path.join(args.output_dir,image_item))

if __name__ == '__main__':
    parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])
    args = parser.parse_args()
    if args.output_dir:
        Path(args.output_dir).mkdir(parents=True, exist_ok=True)
    main(args)

到此就将复现工作完全结束了，感谢博主w1520039381_mmdetection,colab,detr-CSDN博客和暮已深_数据结构,数字图像处理,其他学习笔记-CSDN博客。

参考博客地址：【DETR】训练自己的数据集-实践笔记；pytorch实现DETR的推理程序；windows10复现DEtection TRansformers（DETR）并实现自己的数据集。