人工神经网络实验项目：论文复现

最新推荐文章于 2023-12-01 15:23:21 发布

chocoboeater

最新推荐文章于 2023-12-01 15:23:21 发布

阅读量1.5k

点赞数

分类专栏：深度学习文章标签：深度学习 tensorflow 神经网络计算机视觉人工智能

本文链接：https://blog.csdn.net/chocoboeater/article/details/105592212

版权

Lab Course Project

yourself as a team
choose a 2018/2019 good paper
implement the model/method therein
evaluate the model/method as in paper
report and code

Lab Project

Project Address：https://github.com/Chocoboeater/Image_manipulation_detection

Lab Project Report

Choosen Paper: Learning Rich Features for Image Manipulation Detection

Paper Author: Peng Zhou Xintong Han Vlad I. Morariu Larry S. Davis

Abstract

The paper I chose propose a two-stream Faster RCNN network and train it end-to-end to detect the tampered regions given a manipulated image. The two streams are RGB stream and Noise stream. Understanding the structure of two-stream Faster R-CNN is the key to this experiment. In the experiment, I trained the two-stream Faster R-CNN and the one-stream Faster R-CNN model for RGB stream and Noise stream respectively, and compared their training process and test results. The results I got were not as good as the authors claimed. The reason for this could be that the test method is not accurate enough.

1. Introduction

With the advances of image editing techniques and user-friendly editing software, low-cost tampered or manipulated image generation processes have become widely available. Among tampering techniques, splicing, copy-move, and removal are the most common manipulations. Image splicing copies regions from an authentic image and pastes them to other images, copy-move copies and pastes regions within the same image, and removal eliminates regions from an authentic image followed by inpainting.

Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned.

The author proposes a two-stream Faster R-CNN network and train it end to-end to detect the tampered regions given a manipulated image. One of the two streams is an RGB stream whose purpose is to extract features from the RGB image input to find tampering artifacts like strong contrast difference, unnatural tampered boundaries, and so on. The other is a noise stream that leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions. The author then fuses features from the two streams through a bilinear pooling [2,3] layer tofurther incorporate spatial co-occurrence of these two modalities.

To test the performance of the model, I revised the problem code of an existing two-stream Faster R-CNN model and adjusted the network structure to get the test results. I trained the RGB-N Net, RGB Net and the Noise Net using COCO dataset and VGG16 pretrained model. Compared their training process and test them on a few images. RGB Net has the fastest training speed. And the test result of RGB Net is as good as the RGB-N Net. Therefore, my test results do not reflect the advantages of the model described by the author. But in view of my model training is not in place, and the test method used is not the standard test method used by the author, this result has only little reference value.

在这里插入图片描述

Figure 1. Illustration of two-stream Faster R-CNN network

2. Problem formulation

Study the structure of the two-stream Faster R-CNN network and its performance.

3. Method

The author employ a multi-task framework that simultaneously performs manipulation classification and bounding box regression. RGB images are provided in the RGB stream (the top stream in Figure 1), and SRM images in the noise stream (the bottom stream in Figure 1). Author fuse the two streams through bilinear pooling [2,3] before a fully connected layer for manipulation classification. The RPN uses the RGB stream to localize tampered regions.

3.1. Faster R-CNN

在这里插入图片描述
Figure 2. Faster R-CNN is a single, unified network for object detection. The RPN module serves as the ‘attention’ of this unified network.

Faster RCNN can be divided into four main contents:

Conv layers. As a target detection method for CNN network, Faster RCNN first extracts feature maps of image using a set of basic $conv\ + \ relu\ + \ pooling$ layers. The feature maps are shared for subsequent RPN layers and full connection layers.
Region Proposal Networks. RPN network is used to generate region proposals. This layer judges anchors as positive or negative by software max, and then uses bounding box regression to modify anchors to obtain accurate proposals.
Roi Pooling. This layer collects input feature maps and proposals,
synthesizes these information and extracts proposal feature maps, which are sent to the subsequent full connection layer to determine the target category.
Classification. Proposal feature maps are used to calculate the category of proposal, and bounding box regression is used to get the final exact location of the detection box.

3.2. RGB Stream

The RGB stream is a single Faster R-CNN network and is used both for bounding box regression and manipulation classification. We use a VGG16 network to learn features from the input RGB image. The output features of the last convolutional layer of ResNet are used for manipulation classification. The RPN network in the RGB stream utilizes these features to propose RoI for bounding box regression.

Formally, the loss for the RPN network is defined as

$L_{\text{RPN}}\left( g_{i},\ f_{i} \right) = \frac{1}{N_{\text{cls}}}\sum_{i}^{}{L_{\text{cls}}\left( g_{i},\ g_{i}^{\star} \right) + \leftthreetimes \frac{1}{N_{\text{reg}}}\sum_{i}^{}{ {g_{i}^{\star}L}_{\text{reg}}\left( f_{i},\ f_{i}^{\star} \right)}}$