FCN论文以及源码拆分详解(一)
FCN 论文:Fully Convolutional Networks for Semantic Segmentation
参考github代码。
摘要:
开山之作-----state of-the-art segmentation
Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations.
一数据预处理
1. dataset.py
内容组成:LabelProcessor和 Dataset类
1.1 LabelProcessor
作用: 对图像标签进行编码,----背景知识:哈希函数
def __init__(self, file_path):
self.colormap = self.read_color_map(file_path)
self.cm2lbl = self.encode_label_pix(self.colormap)
read_color_map: 实现读取标签rgb值,输出形式 [[128 128 128],[],…]
“name,r,g,b
Sky,128, 128, 128
Building,128, 0, 0
Pole,192, 192, 128
Road,128, 64, 128
Sidewalk,0,0,192
Tree,128,128,0
SignSymbol,192,128,128
Fence,64,64,128
Car,64,0,128
Pedestrian,64,64,0
Bicyclist,0,128,192
unlabelled,0,0,0
*encode_label_pix:标签编码,返回哈希表----作用:*为了快速查找对应标签
cm2lbl[(cm[0] * 256 + cm[1]) * 256 + cm[2]] = i
计算举例: 128256+ 128256+64 ==>0 ==> sky 类别
*encode_label_img: 查表
def encode_label_img(self, img):
data = np.array(img, dtype='int32')
idx = (data[:, :, 0] * 256 + data[:, :, 1]) * 256 + data[:, :, 2]
return np.array(self.cm2lbl[idx], dtype='int64')
导包:
import pandas as pd
import os
import torch as t
import numpy as np
import torchvision.transforms.functional as ff
from torch.utils.data import Dataset
from PIL import Image
import torchvision.transforms as transforms
import cfg
1.2 Dataset类
def __init__(self, file_path=[], crop_size=None):
# 1 正确读入图片和标签路径
if len(file_path) != 2:
raise ValueError("同时需要图片和标签文件夹的路径")
self.img_path = file_path[0]
self.label_path = file_path[1]
# 2 取出图片和标签数据的文件名
self.imgs = self.read_file(self.img_path)
self.labels = self.read_file(self.label_path)
# 3 初始化数据处理函数设置
self.crop_size = crop_size
def __getitem__(self, index):
img = self.imgs[index]
label = self.labels[index]
# 格式转换,有备无患
img = Image.open(img)
label = Image.open(label).convert('RGB')
img, label = self.center_crop(img, label, self.crop_size)
img, label = self.img_transform(img, label)
# print('处理后的图片和标签大小:',img.shape, label.shape)
sample = {'img': img, 'label': label}
return sample
函数: read_file(self, path): -----获取路径下文件的绝对/相对路径。
函数:img_transform(self, img, label):
“”“对图片和标签做一些数值处理”"" 归一化,并转为pytorch的tensor形式。
调试:
给定指定路径TRAIN_ROOT,TRAIN_LABEL,val和test、crop_size,实例化类。
模型构建见下一篇~。
如有错误请指正!