写在前面:由于课程试验要求,需要基于pytorch实现maskrcnn,so最近又跑了一下pytorch版的maskrcnn,官方已经给出了详细的教程,虽然说支持cpu推理,但是不支持cpu训练啊,奈何手头上只有一个cpu本,也没有nvida显卡,只有intel的集显,so整理一波本次训练maskrcnn的过程。
环境:
Ubuntu16.04
torch == 1.5.0+cpu
torchvision == 0.6.0+cpu
这里要注意,torch版本>=0.3.0即可,使用的torch和torchvision是匹配的,且都是cuda版本,如何选择这两个的匹配版本,请看:https://pytorch.org/
具体操作流程见我的另一篇blog,里面有提到:
【YoloV3–pytorch】Part One:基于Pytorch的YoloV3训练自己的数据集----准备数据集、配置文件并下载预训练权重文件
一、配置数据格式
新建一个文件夹命名为
r
c
n
n
t
e
s
t
\color{red}{rcnntest}
rcnntest,在其下新建一个
d
a
t
a
\color{red}{data}
data文件夹,data文件夹下新建两个文件夹分别命名为:
m
a
s
k
\color{red}{mask}
mask、
o
r
i
\color{red}{ori}
ori。
mask文件夹下存放的是经过labelme标注后得到的mask图像,
ori则存放的是原始rgb图像数据。
二、模型训练
在PyTorch官方的detection/中,有一些封装好的用于模型训练和测试的函数,其中engine.py、utils.py、transforms.py是需要用到的,直接拷贝到rcnntest文件夹根目录下。
git clone https://github.com/pytorch/vision.git
cd vision
cp references/detection/utils.py ../
cp references/detection/transforms.py ../
cp references/detection/engine.py ../
下载速度太慢的话,可以直接打开对应网址,直接复制对应的文件就可以了。
然后打开 engine.py文件屏蔽 87行 torch.cuda.synchronize(),否则后续训练或报错
新建train.py,如下:
import utils
import transforms as T
from engine import train_one_epoch, evaluate
import sys
sys.path.remove('/opt/ros/kinetic/lib/python2.7/dist-packages')
import cv2
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
import os
import torch
import numpy as np
import torch.utils.data
from PIL import Image
class MyDataset(torch.utils.data.Dataset):
def __init__(self, root, transforms=None):
self.root = root
self.transforms = transforms
# load all image files, sorting them to ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "ori"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "mask"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "ori", self.imgs[idx])
mask_path = os.path.join(self.root, "mask", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance with 0 being background
mask = Image.open(mask_path)
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
# print((masks+0).dtype)
masks = torch.as_tensor(masks+0, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
def get_instance_segmentation_model(num_classes):
# load an instance segmentation model pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# get the number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer,
num_classes)
return model
def get_transform(train):
transforms = []
# converts the image, a PIL image, into a PyTorch Tensor
transforms.append(T.ToTensor())
if train:
# during training, randomly flip the training images
# and ground-truth for data augmentation
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
# use the PennFudan dataset and defined transformations
dataset = MyDataset('./data/', get_transform(train=True))
dataset_test = MyDataset('./data/', get_transform(train=False))
# split the dataset in train and test set
torch.manual_seed(1)
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-10])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-10:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=1, shuffle=True, num_workers=0,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=0,
collate_fn=utils.collate_fn)
# device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
device = torch.device('cpu')
# the dataset has two classes only - background and person
num_classes = 2
# get the model using the helper function
model = get_instance_segmentation_model(num_classes)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
momentum=0.9, weight_decay=0.0005)
# the learning rate scheduler decreases the learning rate by 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
# training
num_epochs = 100
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
if (epoch+1) % 5==0:
model_name = "./model_"+str(epoch+1)+".pth"
torch.save(model, model_name)
print("save model!!")
简单的说一下上面的几个函数功能:
MyDataset类是用来加载自己的数据集,使用时直接修改为自己的数据路径即可。特别要注意的是这句代码,原先的demo中没有+0,之所以+0是因为本人的图片中的mask信息是bool类型的,bool类型是无法转换为tensor的,因此+0将其转换为0 1即可。
masks = torch.as_tensor(masks+0, dtype=torch.uint8)
建议制作好数据集之后先直接使用MyDataset类加载一下自己的数据,看是否有错误。具体方法看下面的主要参考资料的“参考blog1”。
get_instance_segmentation_model函数是加载maskrcnn的预训练模型,这里用到的是maskrcnn_resnet50_fpn,可以自行修改。
单线程则设置num_workers为0.
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=1, shuffle=True, num_workers=0,
collate_fn=utils.collate_fn)
至此直接在根目录下运行
python3 train.py
即可。
三、模型测试
测试一下模型在测试图像数据集上的效果。取一张照片即可
model = torch.load('./model_10.pth')
# move model to the right device
model.to(device)
# pick one image from the test set
img, _ = dataset_test[2]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
prediction = model([img.to(device)])
# print(prediction)
image = Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())
image_mask = Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())
Image._show(image_mask)
测试结果:
效果还可以。