2.1【Pytorch版(torch-CPU版 ) Mask-RCNN 训练自己的数据集】(无需安装torch-cuda，在无nvida显卡的电脑下跑通)

EdenGabriel

已于 2022-04-06 17:44:14 修改

阅读量4.9k

点赞数 7

分类专栏：图像分割文章标签： pytorch 深度学习

于 2020-05-15 22:26:55 首次发布

本文链接：https://blog.csdn.net/qq_38587510/article/details/106147166

版权

图像分割专栏收录该内容

5 篇文章 3 订阅

订阅专栏

写在前面：由于课程试验要求，需要基于pytorch实现maskrcnn，so最近又跑了一下pytorch版的maskrcnn，官方已经给出了详细的教程，虽然说支持cpu推理，但是不支持cpu训练啊，奈何手头上只有一个cpu本，也没有nvida显卡，只有intel的集显，so整理一波本次训练maskrcnn的过程。

环境：

Ubuntu16.04

torch == 1.5.0+cpu

torchvision == 0.6.0+cpu

这里要注意，torch版本>=0.3.0即可，使用的torch和torchvision是匹配的，且都是cuda版本，如何选择这两个的匹配版本，请看：https://pytorch.org/
具体操作流程见我的另一篇blog,里面有提到：
【YoloV3–pytorch】Part One：基于Pytorch的YoloV3训练自己的数据集----准备数据集、配置文件并下载预训练权重文件

一、配置数据格式

新建一个文件夹命名为 $\color{red}{rcnntest}$ ，在其下新建一个 $\color{red}{data}$ 文件夹，data文件夹下新建两个文件夹分别命名为： $\color{red}{mask}$ 、 $\color{red}{ori}$ 。
mask文件夹下存放的是经过labelme标注后得到的mask图像，
在这里插入图片描述
ori则存放的是原始rgb图像数据。

二、模型训练

在PyTorch官方的detection/中，有一些封装好的用于模型训练和测试的函数，其中engine.py、utils.py、transforms.py是需要用到的，直接拷贝到rcnntest文件夹根目录下。


git clone https://github.com/pytorch/vision.git
cd vision

cp references/detection/utils.py ../
cp references/detection/transforms.py ../
cp references/detection/engine.py ../

下载速度太慢的话，可以直接打开对应网址，直接复制对应的文件就可以了。
然后打开 engine.py文件屏蔽 87行 torch.cuda.synchronize()，否则后续训练或报错

新建train.py，如下：

import utils
import transforms as T
from engine import train_one_epoch, evaluate

import sys
sys.path.remove('/opt/ros/kinetic/lib/python2.7/dist-packages')
import cv2 

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
 
import os
import torch
import numpy as np
import torch.utils.data
from PIL import Image
 
 
class MyDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms=None):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "ori"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "mask"))))
 
    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "ori", self.imgs[idx])
        mask_path = os.path.join(self.root, "mask", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance with 0 being background
        mask = Image.open(mask_path)
 
        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]
 
        # split the color-encoded mask into a set of binary masks
        masks = mask == obj_ids[:, None, None]
 
        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])
 
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        # print((masks+0).dtype)
        masks = torch.as_tensor(masks+0, dtype=torch.uint8)
 
        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
 
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd
 
        if self.transforms is not None:
            img, target = self.transforms(img, target)
 
        return img, target
 
    def __len__(self):
        return len(self.imgs)

def get_instance_segmentation_model(num_classes):
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
 
    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
 
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
 
    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
 
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)
 
    return model
def get_transform(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(T.ToTensor())
    if train:
        # during training, randomly flip the training images
        # and ground-truth for data augmentation
        transforms.append(T.RandomHorizontalFlip(0.5))
 
    return T.Compose(transforms)

# use the PennFudan dataset and defined transformations
dataset = MyDataset('./data/', get_transform(train=True))
dataset_test = MyDataset('./data/', get_transform(train=False))
 
# split the dataset in train and test set
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-10])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-10:])
 
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=1, shuffle=True, num_workers=0,
    collate_fn=utils.collate_fn)
 
data_loader_test = torch.utils.data.DataLoader(
    dataset_test, batch_size=1, shuffle=False, num_workers=0,
    collate_fn=utils.collate_fn)
 
# device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device = torch.device('cpu')
 
# the dataset has two classes only - background and person
num_classes = 2
 
# get the model using the helper function
model = get_instance_segmentation_model(num_classes)
# move model to the right device
model.to(device)
 
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)
 
# the learning rate scheduler decreases the learning rate by 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)
 
# training
num_epochs = 100
for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
 
    # update the learning rate
    lr_scheduler.step()
 
    # evaluate on the test dataset
    evaluate(model, data_loader_test, device=device)
    
    if (epoch+1) % 5==0:
        model_name = "./model_"+str(epoch+1)+".pth"
        torch.save(model, model_name)
        print("save model!!")

简单的说一下上面的几个函数功能：
MyDataset类是用来加载自己的数据集，使用时直接修改为自己的数据路径即可。特别要注意的是这句代码，原先的demo中没有+0，之所以+0是因为本人的图片中的mask信息是bool类型的，bool类型是无法转换为tensor的，因此+0将其转换为0 1即可。

masks = torch.as_tensor(masks+0, dtype=torch.uint8)

建议制作好数据集之后先直接使用MyDataset类加载一下自己的数据，看是否有错误。具体方法看下面的主要参考资料的“参考blog1”。
get_instance_segmentation_model函数是加载maskrcnn的预训练模型，这里用到的是maskrcnn_resnet50_fpn，可以自行修改。
单线程则设置num_workers为0.

data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=1, shuffle=True, num_workers=0,
    collate_fn=utils.collate_fn)

至此直接在根目录下运行

python3 train.py

即可。
在这里插入图片描述

三、模型测试

测试一下模型在测试图像数据集上的效果。取一张照片即可

model = torch.load('./model_10.pth')
# move model to the right device
model.to(device)

#  pick one image from the test set
img, _ = dataset_test[2]
 
# put the model in evaluation mode
model.eval()

with torch.no_grad():
    prediction = model([img.to(device)])
    # print(prediction)

image = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
image_mask = Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())

Image._show(image_mask)

测试结果：
请添加图片描述

效果还可以。

本篇blog主要参考资料：

参考blog1

EdenGabriel

关注

7
点赞
踩
50

收藏

觉得还不错? 一键收藏
1
评论
2.1【Pytorch版(torch-CPU版 ) Mask-RCNN 训练自己的数据集】(无需安装torch-cuda，在无nvida显卡的电脑下跑通)

写在前面：由于课程试验要求，需要基于pytorch实现maskrcnn，so最近又跑了一下pytorch版的maskrcnn，官方已经给出了详细的教程，虽然说支持cpu推理，但是不支持cpu训练啊，奈何手头上只有一个cpu本，也没有nvida显卡，只有intel的集显，so整理一波本次训练maskrcnn的过程。环境：Ubuntu16.04torch == 1.5.0+cputorchvision == 0.6.0+cpu这里要注意，torch版本>=0.3.0即可，使用的torch和t.
复制链接

扫一扫

专栏目录