图像分割—— fully convolutional network（FNC）理解和代码分析

最新推荐文章于 2024-03-27 10:29:02 发布

涑月听枫

最新推荐文章于 2024-03-27 10:29:02 发布

阅读量1.6k

点赞数 1

文章标签：深度学习 tensorflow

本文链接：https://blog.csdn.net/qq_37614597/article/details/105470588

版权

背景：
FNC是深度神经网络第一次应用于图像分割。FNC通过down sample和up sample实现自动获取图像的feature map并对object进行像素分割，对于实现end-to-end 的model具有里程碑意义。

想法和方法：
传统CNN在图像领域已被证明性能优异，但是传统CNN由于fully connect 的存在，最后意在输出一个class possibility vector。作者通过去除fully connect，并将conv layer得到的feature map进行up sample从而达到了输出 分割图像 的目的。
正如论文中所说，CNN是对图像进行图像级的分类，FCN是对图像进行像素级的分类：中间部分（down sample部分的输出），FCN得到的是一张label后的特征图。之后的up sample，就是将这张label好的特征图进行还原。
网络全程使用Convolution layer组成，去除了传统CNN的fully connection layer。输入图像，通过conv layer对图像down sample后，通过deconv 对图像up sample，在这个过程中通过 lateral connection 将前后对应的feature map 进行连接（combines semantic information (from deep, coarse layers)and appearance information (from shallow, fine layers) in order to produce accurate and detailed segmentations）既兼顾了local和global信息，最终输出为分割后的图像。

优点：
由于去掉了fully connection layer，所以输入可以为任意size，相对的输入size等于输入的size。

不足：
1、不够快
2、不能整合全局上下文信息（global context information）
3、不能很好的拓展到3D领域

改进：
针对（2）问题，ParseNet提出用每一层的平均特征（ average feature for a layer）去增强每一个位置的特征改进的关键点
这里对于feature添加了一个额外的global pooling，产生了一个global feature（a vector），之后对global feature进行norm and uppool，从而得到了一个粗糙的新特征图（global feature）。最后，将新的特征图与最初的特征图进行combined。

代码分析：

个人认为思路比较明确的一份代码

下面展示一些 主要代码。

// fnc网络结构部分
def inference(image, keep_prob):
    """
    Semantic segmentation network definition
    定义分割网络
    :param image: input image. Should have values in range 0-255
    #输入的是正常图像0-255
    :param keep_prob:
    :return:
    """
    print("setting up vgg initialized conv layers ...")
    model_data = utils.get_model_data(FLAGS.model_dir, MODEL_URL)
    #通过自定义的函数加载vgg-19网络
    
    mean = model_data['normalization'][0][0][0]
    mean_pixel = np.mean(mean, axis=(0, 1))
    #模型数据归一化
    weights = np.squeeze(model_data['layers'])

    processed_image = utils.process_image(image, mean_pixel)
    #= image + mean_pixel（为什么要+mean_pixel）
    with tf.variable_scope("inference"):
        image_net = vgg_net(weights, processed_image)
        
        conv_final_layer = image_net["conv5_3"]
        pool5 = utils.max_pool_2x2(conv_final_layer)
        #2x2的最大池化==图像减小2倍

        W6 = utils.weight_variable([7, 7, 512, 4096], name="W6")
        b6 = utils.bias_variable([4096], name="b6")
        conv6 = utils.conv2d_basic(pool5, W6, b6)
        relu6 = tf.nn.relu(conv6, name="relu6")
        #卷积（7x7的filter将512维映射到4096维）+bias+relu激活
        if FLAGS.debug:
            utils.add_activation_summary(relu6)
        relu_dropout6 = tf.nn.dropout(relu6, keep_prob=keep_prob)
        #为了防止或减轻过拟合，随机扔掉一些神经元

        W7 = utils.weight_variable([1, 1, 4096, 4096], name="W7")
        b7 = utils.bias_variable([4096], name="b7")
        conv7 = utils.conv2d_basic(relu_dropout6, W7, b7)
        relu7 = tf.nn.relu(conv7, name="relu7")
        if FLAGS.debug:
            utils.add_activation_summary(relu7)
        relu_dropout7 = tf.nn.dropout(relu7, keep_prob=keep_prob)
        #卷积+bias+relu激活
        
        W8 = utils.weight_variable([1, 1, 4096, NUM_OF_CLASSESS], name="W8")
        b8 = utils.bias_variable([NUM_OF_CLASSESS], name="b8")
        conv8 = utils.conv2d_basic(relu_dropout7, W8, b8)
        # annotation_pred1 = tf.argmax(conv8, dimension=3, name="prediction1")
        #卷积（将4096维映射到21类）+bias+relu激活
        
        # now to upscale to actual image size
        #将前面的热量图映射回去
        
        deconv_shape1 = image_net["pool4"].get_shape()
        #通过前面定义的vgg网络，得到第4层pool的shape
        W_t1 = utils.weight_variable([4, 4, deconv_shape1[3].value, NUM_OF_CLASSESS], name="W_t1")
        b_t1 = utils.bias_variable([deconv_shape1[3].value], name="b_t1")
        conv_t1 = utils.conv2d_transpose_strided(conv8, W_t1, b_t1, output_shape=tf.shape(image_net["pool4"]))
        #deconv
        fuse_1 = tf.add(conv_t1, image_net["pool4"], name="fuse_1")
        #skip connect
        
        deconv_shape2 = image_net["pool3"].get_shape()
        W_t2 = utils.weight_variable([4, 4, deconv_shape2[3].value, deconv_shape1[3].value], name="W_t2")
        b_t2 = utils.bias_variable([deconv_shape2[3].value], name="b_t2")
        conv_t2 = utils.conv2d_transpose_strided(fuse_1, W_t2, b_t2, output_shape=tf.shape(image_net["pool3"]))
        fuse_2 = tf.add(conv_t2, image_net["pool3"], name="fuse_2")

        shape = tf.shape(image)
        deconv_shape3 = tf.stack([shape[0], shape[1], shape[2], NUM_OF_CLASSESS])
        W_t3 = utils.weight_variable([16, 16, NUM_OF_CLASSESS, deconv_shape2[3].value], name="W_t3")
        b_t3 = utils.bias_variable([NUM_OF_CLASSESS], name="b_t3")
        conv_t3 = utils.conv2d_transpose_strided(fuse_2, W_t3, b_t3, output_shape=deconv_shape3, stride=8)
        #输出图像
        annotation_pred = tf.argmax(conv_t3, dimension=3, name="prediction")
        #输出类别预测（最大的值所在的轴）
        
    #在第3维增加一个维度，内容为1    
    return tf.expand_dims(annotation_pred, dim=3), conv_t3

这部分代码，主要是通过加载一个vgg-19网络，将图片输入vgg网络中，并在第四个pool层将内容输出（丢弃了后面的fn层）。后面的第6，7，8层则是用卷积来代替了原本的fn层，对图片深层的信息进一步挖掘。之后的9，10层，则是将第8层的输出deconv + lateral connection前面对应的4（第9层输出），3（第10层输出）层。最后第11层，将第10层的输出deconv成原图片size。

自定义的vgg-19加载函数。

// weights：出入的网络权重，从官网上下载得到
//image：要经过vgg网络特征提取的图片
def vgg_net(weights, image):
    layers = (
        'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',

        'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',

        'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3',
        'relu3_3', 'conv3_4', 'relu3_4', 'pool3',

        'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3',
        'relu4_3', 'conv4_4', 'relu4_4', 'pool4',

        'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3',
        'relu5_3', 'conv5_4', 'relu5_4'
    )
    #vgg-19的结构
    net = {}
    current = image
    for i, name in enumerate(layers):
        kind = name[:4]
        if kind == 'conv':
            kernels, bias = weights[i][0][0][0][0]
            # matconvnet: weights are [width, height, in_channels, out_channels]
            # tensorflow: weights are [height, width, in_channels, out_channels]
            kernels = utils.get_variable(np.transpose(kernels, (1, 0, 2, 3)), name=name + "_w")
            bias = utils.get_variable(bias.reshape(-1), name=name + "_b")
            current = utils.conv2d_basic(current, kernels, bias)
        elif kind == 'relu':
            current = tf.nn.relu(current, name=name)
            if FLAGS.debug:
                utils.add_activation_summary(current)
        elif kind == 'pool':
            current = utils.avg_pool_2x2(current)
        net[name] = current

    return net

主函数。

def main(argv=None):
    keep_probability = tf.placeholder(tf.float32, name="keep_probabilty")
    image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
    #初始化图片数据
    annotation = tf.placeholder(tf.int32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 1], name="annotation")
    #初始化标注

    pred_annotation, logits = inference(image, keep_probability)
    #fnc网络图像处理
    
    tf.summary.image("input_image", image, max_outputs=2)
    tf.summary.image("ground_truth", tf.cast(annotation, tf.uint8), max_outputs=2)
    tf.summary.image("pred_annotation", tf.cast(pred_annotation, tf.uint8), max_outputs=2)
    loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                          labels=tf.squeeze(annotation, squeeze_dims=[3]),
                                                                          #去掉第三维数据
                                                                          name="entropy")))
    #计算的分类logits与实际的annotation进行比较，计算损失
    loss_summary = tf.summary.scalar("entropy", loss)

    trainable_var = tf.trainable_variables()
    if FLAGS.debug:
        for var in trainable_var:
            utils.add_to_regularization_and_summary(var)
    train_op = train(loss, trainable_var)
    #计算损失梯度，梯度下降优化
    print("Setting up summary op...")
    summary_op = tf.summary.merge_all()

    print("Setting up image reader...")
    train_records, valid_records = scene_parsing.read_dataset(FLAGS.data_dir)
    print(len(train_records))
    print(len(valid_records))

    print("Setting up dataset reader")
    image_options = {'resize': True, 'resize_size': IMAGE_SIZE}
    if FLAGS.mode == 'train':
        train_dataset_reader = dataset.BatchDatset(train_records, image_options)
    validation_dataset_reader = dataset.BatchDatset(valid_records, image_options)

    sess = tf.Session()

    print("Setting up Saver...")
    saver = tf.train.Saver()

    # create two summary writers to show training loss and validation loss in the same graph
    # need to create two folders 'train' and 'validation' inside FLAGS.logs_dir
    train_writer = tf.summary.FileWriter(FLAGS.logs_dir + '/train', sess.graph)
    validation_writer = tf.summary.FileWriter(FLAGS.logs_dir + '/validation')

    sess.run(tf.global_variables_initializer())
    ckpt = tf.train.get_checkpoint_state(FLAGS.logs_dir)
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)
        print("Model restored...")

    if FLAGS.mode == "train":
        for itr in xrange(MAX_ITERATION):
            train_images, train_annotations = train_dataset_reader.next_batch(FLAGS.batch_size)
            feed_dict = {image: train_images, annotation: train_annotations, keep_probability: 0.85}

            sess.run(train_op, feed_dict=feed_dict)

            if itr % 10 == 0:
                #10次记录一下记录，输出一下loss
                train_loss, summary_str = sess.run([loss, loss_summary], feed_dict=feed_dict)
                print("Step: %d, Train_loss:%g" % (itr, train_loss))
                train_writer.add_summary(summary_str, itr)

            if itr % 500 == 0:
                #500次记录一下效果图，保存一下sess
                valid_images, valid_annotations = validation_dataset_reader.next_batch(FLAGS.batch_size)
                valid_loss, summary_sva = sess.run([loss, loss_summary], feed_dict={image: valid_images, annotation: valid_annotations,
                                                       keep_probability: 1.0})
                print("%s ---> Validation_loss: %g" % (datetime.datetime.now(), valid_loss))

                # add validation loss to TensorBoard
                validation_writer.add_summary(summary_sva, itr)
                saver.save(sess, FLAGS.logs_dir + "model.ckpt", itr)

    elif FLAGS.mode == "visualize":
        valid_images, valid_annotations = validation_dataset_reader.get_random_batch(FLAGS.batch_size)
        pred = sess.run(pred_annotation, feed_dict={image: valid_images, annotation: valid_annotations,
                                                    keep_probability: 1.0})
        valid_annotations = np.squeeze(valid_annotations, axis=3)
        pred = np.squeeze(pred, axis=3)

        for itr in range(FLAGS.batch_size):
            utils.save_image(valid_images[itr].astype(np.uint8), FLAGS.logs_dir, name="inp_" + str(5+itr))
            utils.save_image(valid_annotations[itr].astype(np.uint8), FLAGS.logs_dir, name="gt_" + str(5+itr))
            utils.save_image(pred[itr].astype(np.uint8), FLAGS.logs_dir, name="pred_" + str(5+itr))
            print("Saved image: %d" % itr)

这里的损失函数，使用的tf.nn.sparse_softmax_cross_entropy_with_logits()函数
是对预测的每个像素的分类与已知像素的分类进行求损失

训练函数。

def train(loss_val, var_list):
    optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
    grads = optimizer.compute_gradients(loss_val, var_list=var_list)
    #对损失计算梯度，并将其存放在var_list中
    if FLAGS.debug:
        # print(len(var_list))
        for grad, var in grads:
            utils.add_gradient_summary(grad, var)
    return optimizer.apply_gradients(grads)