FCN论文以及源码拆分详解（一）

最新推荐文章于 2024-02-08 23:22:20 发布

学习视觉记录

最新推荐文章于 2024-02-08 23:22:20 发布

阅读量662

点赞数 1

分类专栏：论文学习记录文章标签：深度学习 pytorch 机器学习计算机视觉

本文链接：https://blog.csdn.net/weixin_43918682/article/details/120311826

版权

论文学习记录专栏收录该内容

9 篇文章

订阅专栏

本文详细解读了FCN（全卷积网络）的开创性工作，介绍了如何将经典分类网络转化为适用于密集预测任务的全卷积网络，并通过迁移学习进行微调。主要内容包括数据预处理，如LabelProcessor和Dataset类的实现，以及如何将AlexNet、VGGnet和GoogLeNet等模型转化为全卷积网络。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

FCN论文以及源码拆分详解（一）

FCN 论文：Fully Convolutional Networks for Semantic Segmentation
参考github代码。

摘要：
开山之作-----state of-the-art segmentation
Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations.

一数据预处理

1. dataset.py

内容组成：LabelProcessor和 Dataset类
1.1 LabelProcessor
作用：对图像标签进行编码，----背景知识：哈希函数

    def __init__(self, file_path):

        self.colormap = self.read_color_map(file_path)

        self.cm2lbl = self.encode_label_pix(self.colormap)

read_color_map：实现读取标签rgb值，输出形式 [[128 128 128],[]，…]
“name,r,g,b
Sky,128, 128, 128
Building,128, 0, 0
Pole,192, 192, 128
Road,128, 64, 128
Sidewalk,0,0,192
Tree,128,128,0
SignSymbol,192,128,128
Fence,64,64,128
Car,64,0,128
Pedestrian,64,64,0
Bicyclist,0,128,192
unlabelled,0,0,0

*encode_label_pix：标签编码，返回哈希表----作用：*为了快速查找对应标签

cm2lbl[(cm[0] * 256 + cm[1]) * 256 + cm[2]] = i

计算举例： 128256+ 128256+64 ==>0 ==> sky 类别
*encode_label_img: 查表

    def encode_label_img(self, img):

        data = np.array(img, dtype='int32')
        idx = (data[:, :, 0] * 256 + data[:, :, 1]) * 256 + data[:, :, 2]
        return np.array(self.cm2lbl[idx], dtype='int64')

导包：

import pandas as pd
import os
import torch as t
import numpy as np
import torchvision.transforms.functional as ff
from torch.utils.data import Dataset
from PIL import Image
import torchvision.transforms as transforms
import cfg

1.2 Dataset类

    def __init__(self, file_path=[], crop_size=None):
        # 1 正确读入图片和标签路径
        if len(file_path) != 2:
            raise ValueError("同时需要图片和标签文件夹的路径")
        self.img_path = file_path[0]
        self.label_path = file_path[1]
        # 2 取出图片和标签数据的文件名
        self.imgs = self.read_file(self.img_path)
        self.labels = self.read_file(self.label_path)
        # 3 初始化数据处理函数设置
        self.crop_size = crop_size

    def __getitem__(self, index):
        img = self.imgs[index]
        label = self.labels[index]
        # 格式转换，有备无患
        img = Image.open(img)
        label = Image.open(label).convert('RGB')

        img, label = self.center_crop(img, label, self.crop_size)

        img, label = self.img_transform(img, label)
        # print('处理后的图片和标签大小：',img.shape, label.shape)
        sample = {'img': img, 'label': label}

        return sample