PyTorch实现yolov3代码详细解密（四）

最新推荐文章于 2022-01-13 10:43:30 发布

小小小绿叶

最新推荐文章于 2022-01-13 10:43:30 发布

阅读量3.4k

点赞数 3

分类专栏：深度学习文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/litt1e/article/details/89499345

版权

深度学习专栏收录该内容

39 篇文章 55 订阅

订阅专栏

上一章最后，我们得到了一个张量形式的预测结果（D×8），D代表预测了D个结果，8指每个检测结果有8个属性，即：该检测结果所属的 batch 中图像的索引、4 个角的坐标、objectness 分数、有最大置信度的类别的分数、该类别的索引。

在这一部分，我们将为我们的检测器构建输入和输出流程。这涉及到读取图像，做出预测，使用预测结果在图像上绘制边界框，然后将它们保存。

创建命令行参数
在多个文件或者不同语言协同的项目中，python脚本经常需要从命令行直接读取参数。万能的python就自带了argprase包使得这一工作变得简单而规范。

def arg_parse():
    """
    Parse arguements to the detect module
    
    """
    
    
    parser = argparse.ArgumentParser(description='YOLO v3 Detection Module')
   
    parser.add_argument("--images", dest = 'images', help = 
                        "Image / Directory containing images to perform detection upon",
                        default = "imgs", type = str)
    parser.add_argument("--det", dest = 'det', help = 
                        "Image / Directory to store detections to",
                        default = "det", type = str)
    parser.add_argument("--bs", dest = "bs", help = "Batch size", default = 1)
    parser.add_argument("--confidence", dest = "confidence", help = "Object Confidence to filter predictions", default = 0.5)
    parser.add_argument("--nms_thresh", dest = "nms_thresh", help = "NMS Threshhold", default = 0.4)
    parser.add_argument("--cfg", dest = 'cfgfile', help = 
                        "Config file",
                        default = "cfg/yolov3.cfg", type = str)
    parser.add_argument("--weights", dest = 'weightsfile', help = 
                        "weightsfile",
                        default = "yolov3.weights", type = str)
    parser.add_argument("--reso", dest = 'reso', help = 
                        "Input resolution of the network. Increase to increase accuracy. Decrease to increase speed",
                        default = "416", type = str)
    parser.add_argument("--scales", dest = "scales", help = "Scales to use for detection",
                        default = "1,2,3", type = str)
    
    return parser.parse_args()
args = arg_parse()
images = args.images
batch_size = int(args.bs)
confidence = float(args.confidence)
nms_thesh = float(args.nms_thresh)
start = 0
CUDA = torch.cuda.is_available()

在这些参数中，重要的标签包括 images（用于指定输入图像或图像目录）、det（保存检测结果的目录）、reso（输入图像的分辨率，可用于在速度与准确度之间的权衡）、cfg（替代配置文件）和 weightfile。

加载网络
https://raw.githubusercontent.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch/master/data/coco.names。从这里下载 coco.names 文件，这个文件包含了 COCO 数据集中目标的名称。

classes = load_classes("data/coco.names")

将其加载进来

def load_classes(namesfile):
    fp = open(namesfile, "r")
    names = fp.read().split("\n")[:-1]
    return names

在这里插入图片描述
load_classes 是在 util.py 中定义的一个函数，其会返回一个字典——将每个类别的索引映射到其名称的字符串。

#Set up the neural network
    print("Loading network.....")
    model = Darknet(args.cfgfile)
    model.load_weights(args.weightsfile)
    print("Network successfully loaded")
    
    model.net_info["height"] = args.reso
    inp_dim = int(model.net_info["height"])
    assert inp_dim % 32 == 0 
    assert inp_dim > 32

    #If there's a GPU availible, put the model on GPU
    if CUDA:
        model.cuda()
    
    
    #Set the model in evaluation mode
    model.eval()

初始化网络并载入权重args.cfgfile，args.weightsfile在之前的命令行操作中都被赋予了地址。

读取输入图像

#Detection phase
    try:
        imlist = [osp.join(osp.realpath('.'), images, img) for img in os.listdir(images) if os.path.splitext(img)[1] == '.png' or os.path.splitext(img)[1] =='.jpeg' or os.path.splitext(img)[1] =='.jpg']
    except NotADirectoryError:
        imlist = []
        imlist.append(osp.join(osp.realpath('.'), images))
    except FileNotFoundError:
        print ("No file or directory with the name {}".format(images))
        exit()

os.listdir(images) 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表。images=“imgs”，所以它返回的就是imgs里面所有的文件名称组成的list。
在这里插入图片描述
os.path.splitext(img)[1]:分离文件名与扩展名,os.listdir(images)返回的列表如上图，所以os.path.splitext(img)[1]就把(‘dog’, ‘.jpg’)中的.jpg提取出来。
osp.realpath(’.’)返回指定文件的标准路径，G:\pycharm\yolov3\pytorch-yolo-v3-master。综上，imlist 就是把imgs文件夹内所有符合要求的图像的最终地址组成列表。在这里插入图片描述

batches = list(map(prep_image, imlist, [inp_dim for x in range(len(imlist))]))

def prep_image(img, inp_dim):
    """
    Prepare image for inputting to the neural network. 
    
    Returns a Variable 
    """

    orig_im = cv2.imread(img)
    dim = orig_im.shape[1], orig_im.shape[0]
    img = (letterbox_image(orig_im, (inp_dim, inp_dim)))
    img_ = img[:,:,::-1].transpose((2,0,1)).copy()
    img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
    return img_, orig_im, dim

map函数将对应的元素作为参数传入prep_image函数，最终将返回img_, orig_im, dim即PyTorch 的图像输入，原图以及高度和宽度，将它们组成列表赋值给batches。

	im_batches = [x[0] for x in batches]
    orig_ims = [x[1] for x in batches]
    im_dim_list = [x[2] for x in batches]
    im_dim_list = torch.FloatTensor(im_dim_list).repeat(1,2)

batches中的img_赋值给im_batches，orig_im赋值给orig_ims，dim赋值给im_dim_list。

leftover = 0
    
    if (len(im_dim_list) % batch_size):
        leftover = 1
        
        
    if batch_size != 1:
        num_batches = len(imlist) // batch_size + leftover            
        im_batches = [torch.cat((im_batches[i*batch_size : min((i +  1)*batch_size,
                            len(im_batches))]))  for i in range(num_batches)]

im_batches ：将所有需要预测图像的pytorch形式融合到一个张量里面。

        with torch.no_grad():
            prediction = model(Variable(batch), CUDA)

#        prediction = prediction[:,scale_indices]


        #get the boxes with object confidence > threshold
        #Convert the cordinates to absolute coordinates
        #perform NMS on these boxes, and save the results
        #I could have done NMS and saving seperately to have a better abstraction
        #But both these operations require looping, hence
        #clubbing these ops in one loop instead of two.
        #loops are slower than vectorised operations.

        prediction = write_results(prediction, confidence, num_classes, nms = True, nms_conf = nms_thesh)

将在batch里的每一张图当作输入放入model中，得到prediction，在这个过程中，我们会删去置信度小于阈值的boxes，将坐标值转化为绝对坐标，对每一类做NMS。

        if type(prediction) == int:
            i += 1
            continue

        end = time.time()


#        print(end - start)



        prediction[:,0] += i*batch_size#把第几副图像的索引添加到prediction中

用if来检测一幅图中是否存在box，如果不存在则i+1跳过这幅图，把第几幅图像的索引加到predition中。

        if not write:
            output = prediction
            write = 1
        else:
            output = torch.cat((output,prediction))

用write这个flag来判断是否一幅图中的3层yolo都predition了，如果完成了则write=1.否则继续融合。

        for im_num, image in enumerate(imlist[i*batch_size: min((i +  1)*batch_size, len(imlist))]):
            im_id = i*batch_size + im_num
            objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
            print("{0:20s} predicted in {1:6.3f} seconds".format(image.split("/")[-1], (end - start)/batch_size))
            print("{0:20s} {1:s}".format("Objects Detected:", " ".join(objs)))
            print("----------------------------------------------------------")

if int(x[0]) == im_id，output中的x[0]指的是第几幅图像的索引，所以用它来判断是否索引是否对应，如果对应则将x[-1]即得分最高类别的索引指向的calsses赋值给objs。

output_recast = time.time()
output[:,1:5] = torch.clamp(output[:,1:5], 0.0, float(inp_dim))

im_dim_list = torch.index_select(im_dim_list, 0, output[:,0].long())/inp_dim
output[:,1:5] *= im_dim_list

我们的输出张量中包含的预测结果对应的是该网络的输入大小，而不是图像的原始大小。因此，在我们绘制边界框之前，让我们将每个边界框的角属性转换到图像的原始尺寸上。

def write(x, results, color):
    c1 = tuple(x[1:3].int())
    c2 = tuple(x[3:5].int())
    img = results[int(x[0])]
    cls = int(x[-1])
    label = "{0}".format(classes[cls])
    cv2.rectangle(img, c1, c2,color, 1)
    t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
    c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
    cv2.rectangle(img, c1, c2,color, -1)
    cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
    return img

将box绘制到原图上。

小小小绿叶

关注

3
点赞
踩
10

收藏

觉得还不错? 一键收藏
2
评论
PyTorch实现yolov3代码详细解密（四）

上一章最后，我们得到了一个张量形式的预测结果（D×8），D代表预测了D个结果，8指每个检测结果有8个属性，即：该检测结果所属的 batch 中图像的索引、4 个角的坐标、objectness 分数、有最大置信度的类别的分数、该类别的索引。在这一部分，我们将为我们的检测器构建输入和输出流程。这涉及到读取图像，做出预测，使用预测结果在图像上绘制边界框，然后将它们保存。创建命令行参数在多个文件或者...
复制链接

扫一扫

专栏目录