27图像分割

最新推荐文章于 2024-04-05 15:27:45 发布

G5Lorenzo

最新推荐文章于 2024-04-05 15:27:45 发布

阅读量1.3k

点赞数 2

分类专栏： # Pytorch

本文链接：https://blog.csdn.net/qq_36825778/article/details/104234209

版权

一、图像分割

1.1 图像分割是什么？

在这里插入图片描述
图像分割：将图像每一个像素进行分类

1.2 图像分割分类

在这里插入图片描述
图像分割分类：

超像素分割：少量超像素代替大量像素，常用于图像预处理
- 超像素：一个超像素由很多由相同性质的像素构成，如左上图中的每个白色块
语义分割：逐像素分类，无法区分个体
实例分割：对个体目标进行分割，像素级目标检测
- 只会将感兴趣的目标进行分割，比如说图中的人
全景分割：语义分割结合实例分割
- 将每个个体进行区分
- 将每个像素进行分类

二、图像分割的实现

2.1 模型是如何将图像分割的？

在这里插入图片描述
图像分类：输出是一个一维的向量，上面每一个分量表示一个类别

图像分割：输出是一个三维的张量，二维面上的每一个点对应第三维向量，其每个分量对应一个类别

在这里插入图片描述
计算机接受图像，即3-d张量的输入，输出也是3-d张量

基于pascal voc数据集，类别为21，具体类别信息如下：

classes = ['__background__',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair',
           'cow', 'diningtable', 'dog', 'horse',
           'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor']

2.2 torch.hub

PyTorch-Hub——PyTorch模型库，有大量模型供开发者调用

torch.hub.load('pytorch/vision', 'deeplabv3_resnet101',pretrained=True)
model = torch.hub.load(github, model, *args, **kwargs)

功能：加载预训练模型

主要参数：

github: str, 项目名, eg:pytorch/vision,<repo _owner/repo _name[:tag_name]>
model: str, 模型名
pretrained: 是否加载预训练模型的参数

torch.hub.list(github, force _reload=False)

功能：列出github参数所指定项目中所提供的模型

torch.hub.help(github, model, force _reload=False)

功能：列出模型中有哪些参数

2.3 代码示例

# -*- coding: utf-8 -*-
"""
# @file name  : seg_demo.py
# @author     : TingsongYu https://github.com/TingsongYu
# @date       : 2019-11-22
# @brief      : torch.hub调用deeplab-V3进行图像分割
"""

import os
import time
import torch.nn as nn
import torch
import numpy as np
import torchvision.transforms as transforms
from PIL import Image
from matplotlib import pyplot as plt

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

if __name__ == "__main__":

    path_img = os.path.join(BASE_DIR, "demo_img1.png")
    # path_img = os.path.join(BASE_DIR, "demo_img2.png")
    # path_img = os.path.join(BASE_DIR, "demo_img3.png")

    # config
    preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])

    # 1. load data & model
    input_image = Image.open(path_img).convert("RGB")
    model = torch.hub.load('pytorch/vision', 'deeplabv3_resnet101', pretrained=True)
    model.eval()

    # 2. preprocess
    input_tensor = preprocess(input_image)
    input_bchw = input_tensor.unsqueeze(0)

    # 3. to device
    if torch.cuda.is_available():
        input_bchw = input_bchw.to(device)
        model.to(device)

    # 4. forward
    with torch.no_grad():
        tic = time.time()
        print("input img tensor shape:{}".format(input_bchw.shape))
        output_4d = model(input_bchw)['out']
        output = output_4d[0]
        print("pass: {:.3f}s use: {}".format(time.time() - tic, device))
        print("output img tensor shape:{}".format(output.shape))
    output_predictions = output.argmax(0)

    # 5. visualization
    palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
    colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
    colors = (colors % 255).numpy().astype("uint8")

    # plot the semantic segmentation predictions of 21 classes in each color
    r = Image.fromarray(output_predictions.byte().cpu().numpy()).resize(input_image.size)
    r.putpalette(colors)
    plt.subplot(121).imshow(r)
    plt.subplot(122).imshow(input_image)
    plt.show()

    # appendix
    classes = ['__background__',
                       'aeroplane', 'bicycle', 'bird', 'boat',
                       'bottle', 'bus', 'car', 'cat', 'chair',
                       'cow', 'diningtable', 'dog', 'horse',
                       'motorbike', 'person', 'pottedplant',
                       'sheep', 'sofa', 'train', 'tvmonitor']