图像分割的类型
- 图像分割(Image Segmentation)
- 图像语义分割(Image Semantic Segmentation)
—给每个pixel分类 - 图像实例分割(Image Instance Segmentation)
—给每个框里的object分mask - 图像全景分割(Image Panorama Segmentation)
—背景pixel分类+框里mask - 视频目标分割(Video Object Segmentation)
—通常会给定目标mask,求特定目标的mask - 视频实例分割(Video Instance Segmentation)
—根据目标检测的框,求目标的mask
图像分割的应用场景
- 人像分割(头发分割、人脸分割、背景分割)
- 自动驾驶(行人、车辆分割、车道分割)
- 医学分割(病理、CT、MRI)
- 工业质检、分拣机器人
语义分割算法
目的:像素级分类
基本流程:
- 输入:图像(RGB)
- 算法:深度学习模型
- 输出:分类结果(与输入大小一致的单通道图)
- 训练过程:
– 输入:image+label
– 前向:out=model(image)
– 计算损失:loss=loss_func(out,label)
– 反向:loss.backward()
– 更新权重:optimizer.minimzize(loss)
语义分割性能指标
- 分割网络的评价指标:
mIou:mean Intersection-Over-Union,分割每一类别的交并比(Iou)
mAcc:mean Accuracy,Pred和GT对应位置的“分类”准确率
环境搭建
- 使用Baidu AI Studio
- 安装Paddle1.8
框架basic代码
Paddle动态图
basic_model.py
输入:1388(NCHW)
OP:Pool2D(pool_size=4,pool_stride=4)
输出:15988
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph import Conv2D
from paddle.fluid.dygraph import to_variable
from paddle.fluid.dygraph import Pool2D
import numpy as np
np.set_printoptions(precision=2)
class BasicModel(fluid.dygraph.Layer):
def __init__(self, num_classes=59):
super(BasicModel, self).__init__()
self.pool = Pool2D(pool_size=2,pool_stride=2) #缩小一倍
self.conv = Conv2D(num_channels=3,num_filters=num_classes,filter_size=1)
def forward(self, inputs):
x = self.pool(inputs)
x = fluid.layers.interpolate(x,out_shape=inputs.shape[2::])
x=self.conv(x)
return x
def main():
place = paddle.fluid.CPUPlace()
with fluid.dygraph.guard(place):
model = BasicModel(num_classes=59)
model.eval()
input_data = np.random.rand(1,3,8,8).astype(np.float32)# nchw
print('Input data shape: ', input_data.shape)
input_data = to_variable(input_data)
print("input_data is ",input_data)
output_data = model(input_data)
output_data = output_data.numpy()
print('Output data shape: ', output_data.shape)
if __name__ == "__main__":
main()
Paddle数据加载
basic_dataloader.py
import os
import random
import numpy as np
import cv2
import paddle.fluid as fluid
class Transform(object):
def __init__(self,size=256):
self.size = size
def __call__(self,input,label):
input = cv2.resize(input,(self.size,self.size),interpolation=cv2.INTER_LINEAR)
label = cv2.resize(label,(self.size,self.size),interpolation=cv2.INTER_NEAREST)
return input,label
class BasicDataLoader():
def __init__(self,
image_folder,
image_list_file,
transform=None,
shuffle=True):
self.image_folder = image_folder
self.image_list_file = image_list_file
self.transform = transform
self.shuffle = shuffle
self.data_list = self.read_list()
def read_list(self):
data_list = []
with open(self.image_list_file) as infile:
for line in infile:
data_path = os.path.join(self.image_folder,line.split()[0])
label_path = os.path.join(self.image_folder,line.split()[1])
data_list.append((data_path,label_path))
random.shuffle(data_list)
return data_list
def preprocess(self, data, label):
h,w,c = data.shape
h_gt,w_gt = label.shape
if self.transform:
data,label = self.transform(data,label)
label = label[:,:,np.newaxis]
return data,label
def __len__(self):
return len(self.data_list)
def __call__(self):
for data_path,label_path in self.data_list:
data = cv2.imread(data_path,cv2.IMREAD_COLOR)
label = cv2.imread(label_path,cv2.IMREAD_GRAYSCALE)
print(data.shape