人工神经网络实验项目:论文复现

Lab Course Project

  • yourself as a team
  • choose a 2018/2019 good paper
  • implement the model/method therein
  • evaluate the model/method as in paper
  • report and code

Lab Project

Project Address:https://github.com/Chocoboeater/Image_manipulation_detection


Lab Project Report

Choosen Paper: Learning Rich Features for Image Manipulation Detection

Paper Author: Peng Zhou Xintong Han Vlad I. Morariu Larry S. Davis

Abstract

The paper I chose propose a two-stream Faster RCNN network and train it end-to-end to detect the tampered regions given a manipulated image. The two streams are RGB stream and Noise stream. Understanding the structure of two-stream Faster R-CNN is the key to this experiment. In the experiment, I trained the two-stream Faster R-CNN and the one-stream Faster R-CNN model for RGB stream and Noise stream respectively, and compared their training process and test results. The results I got were not as good as the authors claimed. The reason for this could be that the test method is not accurate enough.

1. Introduction

With the advances of image editing techniques and user-friendly editing software, low-cost tampered or manipulated image generation processes have become widely available. Among tampering techniques, splicing, copy-move, and removal are the most common manipulations. Image splicing copies regions from an authentic image and pastes them to other images, copy-move copies and pastes regions within the same image, and removal eliminates regions from an authentic image followed by inpainting.

Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned.

The author proposes a two-stream Faster R-CNN network and train it end to-end to detect the tampered regions given a manipulated image. One of the two streams is an RGB stream whose purpose is to extract features from the RGB image input to find tampering artifacts like strong contrast difference, unnatural tampered boundaries, and so on. The other is a noise stream that leverages the noise features extracted from a steganalysis rich model filter layer to discover the noise inconsistency between authentic and tampered regions. The author then fuses features from the two streams through a bilinear pooling [2,3] layer tofurther incorporate spatial co-occurrence of these two modalities.

To test the performance of the model, I revised the problem code of an existing two-stream Faster R-CNN model and adjusted the network structure to get the test results. I trained the RGB-N Net, RGB Net and the Noise Net using COCO dataset and VGG16 pretrained model. Compared their training process and test them on a few images. RGB Net has the fastest training speed. And the test result of RGB Net is as good as the RGB-N Net. Therefore, my test results do not reflect the advantages of the model described by the author. But in view of my model training is not in place, and the test method used is not the standard test method used by the author, this result has only little reference value.

在这里插入图片描述

Figure 1. Illustration of two-stream Faster R-CNN network

2. Problem formulation

Study the structure of the two-stream Faster R-CNN network and its performance.

3. Method

The author employ a multi-task framework that simultaneously performs manipulation classification and bounding box regression. RGB images are provided in the RGB stream (the top stream in Figure 1), and SRM images in the noise stream (the bottom stream in Figure 1). Author fuse the two streams through bilinear pooling [2,3] before a fully connected layer for manipulation classification. The RPN uses the RGB stream to localize tampered regions.

3.1. Faster R-CNN

在这里插入图片描述
Figure 2. Faster R-CNN is a single, unified network for object detection. The RPN module serves as the ‘attention’ of this unified network.

Faster RCNN can be divided into four main contents:

  1. Conv layers. As a target detection method for CNN network, Faster RCNN first extracts feature maps of image using a set of basic c o n v   +   r e l u   +   p o o l i n g conv\ + \ relu\ + \ pooling conv + relu + pooling layers. The feature maps are shared for subsequent RPN layers and full connection layers.

  2. Region Proposal Networks. RPN network is used to generate region proposals. This layer judges anchors as positive or negative by software max, and then uses bounding box regression to modify anchors to obtain accurate proposals.

  3. Roi Pooling. This layer collects input feature maps and proposals,
    synthesizes these information and extracts proposal feature maps, which are sent to the subsequent full connection layer to determine the target category.

  4. Classification. Proposal feature maps are used to calculate the category of proposal, and bounding box regression is used to get the final exact location of the detection box.

3.2. RGB Stream

The RGB stream is a single Faster R-CNN network and is used both for bounding box regression and manipulation classification. We use a VGG16 network to learn features from the input RGB image. The output features of the last convolutional layer of ResNet are used for manipulation classification. The RPN network in the RGB stream utilizes these features to propose RoI for bounding box regression.

Formally, the loss for the RPN network is defined as

L RPN ( g i ,   f i ) = 1 N cls ∑ i L cls ( g i ,   g i ⋆ ) + ⋋ 1 N reg ∑ i g i ⋆ L reg ( f i ,   f i ⋆ ) L_{\text{RPN}}\left( g_{i},\ f_{i} \right) = \frac{1}{N_{\text{cls}}}\sum_{i}^{}{L_{\text{cls}}\left( g_{i},\ g_{i}^{\star} \right) + \leftthreetimes \frac{1}{N_{\text{reg}}}\sum_{i}^{}{ {g_{i}^{\star}L}_{\text{reg}}\left( f_{i},\ f_{i}^{\star} \right)}} LRPN(gi, fi)=Ncls1iLcls(gi, gi)+Nreg1igiL

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值