pytorch学习（5）在MaskRCNN上进行finetuning

最新推荐文章于 2024-04-03 09:47:18 发布

lzrrrrr

最新推荐文章于 2024-04-03 09:47:18 发布

阅读量1.2k

点赞数

分类专栏： pytorch 文章标签： pytorch 深度学习自动驾驶神经网络

本文链接：https://blog.csdn.net/lzr_ps/article/details/103864563

版权

这篇博客介绍了如何在PyTorch中对Mask R-CNN进行微调，以训练一个针对行人的检测与分割模型。首先，详细讲述了数据集的准备过程，包括数据集结构和数据预处理。接着，定义了模型结构，特别是在预训练的Mask R-CNN基础上添加了一层全连接层以适应新的类别。然后，阐述了训练和评估代码的实现，利用预训练权重加速训练。最后，通过可视化展示了模型的效果。

摘要由CSDN通过智能技术生成

文章目录

pytorch学习（5）在MaskRCNN上进行finetuning

pytorch学习（5）在MaskRCNN上进行finetuning

在教程上正好看到一篇目标跟踪的教程，正好拿来练练手吧
这篇教程的目的是在MaskRCNN上进行微调来训练一个行人检测与分割模型

数据集准备

首先下载行人数据集，该数据的目录应该是这样的：
在这里插入图片描述

annotation是每张图片上的信息
PedMasks是分割标签
PNGImages是图像
然后我们需要写一个类来初始化这个数据集

import os
import numpy as np
import torch
from PIL import Image


class PennFudanDataset(object):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)
        # convert the PIL Image into a numpy array
        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append(