DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks
ICCV2017
http://people.ee.ethz.ch/~ihnatova/
https://github.com/aiff22/DPED
本文使用CNN网络对手机拍摄的图像进行图像增强。
训练数据的问题怎么解决了?
这里我们使用3个手机和一个单反同时拍摄一个物体,由此建立了一个数据库
A new large-scale dataset of over 6K photos taken synchronously by a DSLR camera and 3 low-end cameras of smartphones in a wide variety of conditions.
2 DSLR Photo Enhancement Dataset (DPED)
在三周的时间里我们采集了 2万2千张图像,包括 sony手机的 4549张图像,苹果手机的5727张图像,黑莓手机和 佳能单反 个 6015张图像。这些图像拍摄于白天,各个不同地点,不同光照和天气条件下。使用的是自动模式,所有的相机都使用默认的设置。
Matching algorithm
这些图像虽然是同时拍摄的,但是因为四个相机具有不同的视角和位置,所以需要我们进行匹配
这里我们使用 SIFT keypoints 和 RANSAC 得到映射关系 homography
Training CNN on the aligned high-resolution images is infeasible
这里我们使用 100*100的 图像块
100 original images were reserved for testing
最终我们得到的训练测试数据如下:
139K, 160K and 162K training and 2.4-4.3K test patches for BlackBerry-Canon, iPhone-Canon and Sony-Canon pairs
3 Method
下面我们设计CNN网络来完成 输入一张 low-quality photo 输出 a DSLR camera 拍摄的图像
3.1. Loss function
图像增强任务主要的困难在于 输入输出图像不能密集匹配(像素对像素): different optics and sensors cause specific local non-linear distortions and aberrations, leading to a
non-constant shift of pixels between each image pair even after precise alignment. Hence, the standard per-pixel losses, besides being doubtful as a perceptual quality
metric, are not applicable in our case.
这里我们基于以下假设来建立损失函数:
the overall perceptual image quality can be decomposed into three independent parts: i) color quality, ii) texture quality and iii) content quality.
3.1.1 Color loss
对于颜色差异,我们使用一个 Gaussian blur,计算两者的欧式距离,在CNN中对应一个额外得卷积层,它包含 a fixed Gaussian kernel followed by the mean squared error
(MSE) function
Color loss 定义如下:
The idea behind this loss is to evaluate the difference in brightness, contrast and major colors between the images while eliminating texture and content comparison
The crucial property of this loss is its invariance to small distortions.
color loss forces the enhanced image to have the same color distribution as the target one, while being tolerant to small mismatches
3.1.2 Texture loss
这里我们使用一个 generative adversarial networks(GANs) 来学习一个合适的 texture quality 度量
3.1.3 Content loss
参考文献【9,12】 we define our content loss based on the activation maps produced by the ReLU layers of the pre-trained VGG-19 network
3.1.4 Total variation loss
最后我们还加了一个 total variation (TV) loss 来 enforce spatial smoothness of the produced images
3.1.5 Total loss
总的网络结构图如下:
4 Experiments