2.1【Pytorch版(torch-CPU版 ) Mask-RCNN 训练自己的数据集】(无需安装torch-cuda,在无nvida显卡的电脑下跑通)

写在前面:由于课程试验要求,需要基于pytorch实现maskrcnn,so最近又跑了一下pytorch版的maskrcnn,官方已经给出了详细的教程,虽然说支持cpu推理,但是不支持cpu训练啊,奈何手头上只有一个cpu本,也没有nvida显卡,只有intel的集显,so整理一波本次训练maskrcnn的过程。

环境
Ubuntu16.04
torch == 1.5.0+cpu
torchvision == 0.6.0+cpu

这里要注意,torch版本>=0.3.0即可,使用的torch和torchvision是匹配的,且都是cuda版本,如何选择这两个的匹配版本,请看:https://pytorch.org/
具体操作流程见我的另一篇blog,里面有提到:
【YoloV3–pytorch】Part One:基于Pytorch的YoloV3训练自己的数据集----准备数据集、配置文件并下载预训练权重文件

一、配置数据格式

新建一个文件夹命名为 r c n n t e s t \color{red}{rcnntest} rcnntest,在其下新建一个 d a t a \color{red}{data} data文件夹,data文件夹下新建两个文件夹分别命名为: m a s k \color{red}{mask} mask o r i \color{red}{ori} ori
mask文件夹下存放的是经过labelme标注后得到的mask图像,
在这里插入图片描述
ori则存放的是原始rgb图像数据。

二、模型训练

在PyTorch官方的detection/中,有一些封装好的用于模型训练和测试的函数,其中engine.py、utils.py、transforms.py是需要用到的,直接拷贝到rcnntest文件夹根目录下。


git clone https://github.com/pytorch/vision.git
cd vision

cp references/detection/utils.py ../
cp references/detection/transforms.py ../
cp references/detection/engine.py ../

下载速度太慢的话,可以直接打开对应网址,直接复制对应的文件就可以了。
然后打开 engine.py文件屏蔽 87行 torch.cuda.synchronize(),否则后续训练或报错

新建train.py,如下:

import utils
import transforms as T
from engine import train_one_epoch, evaluate

import sys
sys.path.remove('/opt/ros/kinetic/lib/python2.7/dist-packages')
import cv2 

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
 
import os
import torch
import numpy as np
import torch.utils.data
from PIL import Image
 
 
class MyDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms=None):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "ori"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "mask"))))
 
    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "ori", self.imgs[idx])
        mask_path = os.path.join(self.root, "mask", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance with 0 being background
        mask = Image.open(mask_path)
 
        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]
 
        # split the color-encoded mask into a set of binary masks
        masks = mask == obj_ids[:, None, None]
 
        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])
 
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        # print((masks+0).dtype)
        masks = torch.as_tensor(masks+0, dtype=torch.uint8)
 
        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
 
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd
 
        if self.transforms is not None:
            img, target = self.transforms(img, target)
 
        return img, target
 
    def __len__(self):
        return len(self.imgs)

def get_instance_segmentation_model(num_classes):
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
 
    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
 
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
 
    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
 
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)
 
    return model
def get_transform(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(T.ToTensor())
    if train:
        # during training, randomly flip the training images
        # and ground-truth for data augmentation
        transforms.append(T.RandomHorizontalFlip(0.5))
 
    return T.Compose(transforms)

# use the PennFudan dataset and defined transformations
dataset = MyDataset('./data/', get_transform(train=True))
dataset_test = MyDataset('./data/', get_transform(train=False))
 
# split the dataset in train and test set
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-10])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-10:])
 
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=1, shuffle=True, num_workers=0,
    collate_fn=utils.collate_fn)
 
data_loader_test = torch.utils.data.DataLoader(
    dataset_test, batch_size=1, shuffle=False, num_workers=0,
    collate_fn=utils.collate_fn)
 
# device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device = torch.device('cpu')
 
# the dataset has two classes only - background and person
num_classes = 2
 
# get the model using the helper function
model = get_instance_segmentation_model(num_classes)
# move model to the right device
model.to(device)
 
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)
 
# the learning rate scheduler decreases the learning rate by 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)
 
# training
num_epochs = 100
for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
 
    # update the learning rate
    lr_scheduler.step()
 
    # evaluate on the test dataset
    evaluate(model, data_loader_test, device=device)
    
    if (epoch+1) % 5==0:
        model_name = "./model_"+str(epoch+1)+".pth"
        torch.save(model, model_name)
        print("save model!!")

简单的说一下上面的几个函数功能:
MyDataset类是用来加载自己的数据集,使用时直接修改为自己的数据路径即可。特别要注意的是这句代码,原先的demo中没有+0,之所以+0是因为本人的图片中的mask信息是bool类型的,bool类型是无法转换为tensor的,因此+0将其转换为0 1即可。

masks = torch.as_tensor(masks+0, dtype=torch.uint8)

建议制作好数据集之后先直接使用MyDataset类加载一下自己的数据,看是否有错误。具体方法看下面的主要参考资料的“参考blog1”。
get_instance_segmentation_model函数是加载maskrcnn的预训练模型,这里用到的是maskrcnn_resnet50_fpn,可以自行修改。
单线程则设置num_workers为0.

data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=1, shuffle=True, num_workers=0,
    collate_fn=utils.collate_fn)

至此直接在根目录下运行

python3 train.py

即可。
在这里插入图片描述

三、模型测试

测试一下模型在测试图像数据集上的效果。取一张照片即可

model = torch.load('./model_10.pth')
# move model to the right device
model.to(device)

#  pick one image from the test set
img, _ = dataset_test[2]
 
# put the model in evaluation mode
model.eval()

with torch.no_grad():
    prediction = model([img.to(device)])
    # print(prediction)

image = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
image_mask = Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())

Image._show(image_mask)

测试结果:
请添加图片描述
请添加图片描述
效果还可以。

本篇blog主要参考资料:
参考blog1
  • 7
    点赞
  • 50
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值