论文笔记：EnlightenGAN: Deep Light Enhancement without Paired Supervision

最新推荐文章于 2024-01-07 18:33:24 发布

Wenretium

最新推荐文章于 2024-01-07 18:33:24 发布

阅读量797

点赞数

分类专栏：论文笔记文章标签：计算机视觉

本文链接：https://blog.csdn.net/weixin_45725902/article/details/115314051

版权

论文笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

EnlightenGAN: Deep Light Enhancement without Paired Supervision

来源：TOG2019

提出背景：

the availability assumption of paired training images has raised more difficulties, when it comes to enhancing images from more uncontrolled scenarios, such as dehazing, deraining or low-light enhancement:

it is very difficult or even impractical to simultaneously capture corrupted and ground truth images of the same visual scene (e.g., low-light and normal-light image pairs at the same time)
synthesizing corrupted images from clean images could sometimes help, but such synthesized results are usually not photo realistic enough, leading to various artifacts when the trained model is applied to real-world low-light images
specifically for the low-light enhancement problem, there may be no unique or well-defined high-light ground truth given a low-light image. For example, any photo taken from dawn to dusk could be viewed as a highlight version for the photo taken over the midnight at the same scene.

优点：

EnlightenGAN is the first work that successfully introduces unpaired training to low-light image enhancement. Such a training strategy removes the dependency on paired training data and enables us to train with larger varieties of images from different domains. It also avoids overfitting any specific data generation protocol or imaging device that previous works [15], [5], [16] implicitly rely on, hence leading to notably improved real-world generalization.
EnlightenGAN gains remarkable performance by imposing
- a global-local discriminator structure that handles spatially-varying light conditions in the input image;(global-local思想很常见)
- the idea of self-regularization, implemented by both the self feature preserving loss and the self-regularized attention mechanism. The self-regularization is critical to our model success, because of the unpaired setting where no strong form of external supervision is available.
EnlightenGAN is compared with several state-of-theart methods via comprehensive experiments. The results are measured in terms of visual quality, no-referenced image quality assessment, and human subjective survey. All results consistently endorse the superiority of EnlightenGAN. Moreover, in contrast to existing pairedtrained enhancement approaches, EnlightenGAN proves particularly easy and flexible to be adapted to enhancing real-world low-light images from different domains.

不足点：

没有提到

Methodology:

在这里插入图片描述

Global-Local Discriminators

To enhance local regions adaptively in addition to improving the light globally, we propose a novel global-local discriminator structure, both using PatchGAN for real/fake discrimination.

In addition to the image-level global discriminator, we add a local discriminator by taking randomly cropped local patches from both output and real normal-light images, and learning to distinguish whether they are real (from real images) or fake (from enhanced outputs). Such a global-local structure ensures all local patches of an enhanced images look like realistic normal-light ones, which proves to be critical in avoiding local over- or under-exposures as our experiments will reveal later.

The standard function of relativistic discriminator is:

在这里插入图片描述

We slight modify the relativistic discriminator to replace the sigmoid function with the least-square GAN(LSGAN) [39] loss

Finally, the loss functions for the global discriminator D and the generator G are:

在这里插入图片描述
local discriminator:

在这里插入图片描述

Self Feature Preserving Loss

We call it self feature preserving loss to stress its self-regularization utility to preserve the image content features to itself, before and after the enhancement.

在这里插入图片描述

整个损失函数：

在这里插入图片描述

U-Net Generator Guided with Self-Regularized Attention

We further propose an easy-to-use attention mechanism for the U-Net generator.

Intuitively, in a low-light image of spatially varying light condition, we always want to enhance the dark regions more than bright regions, so that the output image has neither over- nor under-exposure.

We take the illumination channel $I$ of the input RGB image, normalize it to [0,1], and then use 1- $I$ (element-wise difference) as our self-regularized attention map.

We then resize the attention map to fit each feature map and multiply it with all intermediate feature maps as well as the output image. We emphasize that our attention map is also a form of self-regularization, rather than learned with supervision. Despite its simplicity, the attention guidance shows to improve the visual quality consistently.

Experiment

Dataset and Implementation Details

collect a larger-scale unpaired training set, that covers diverse image qualities and contents.

assemble a mixture of 914 low light and 1016 normal light images, without the need to keep any pair.

All these photos are converted to PNG format and resized to 600 × 400 pixels. For testing images, we choose those standard ones used in previous works

The whole training process takes 3 hours on 3 Nvidia 1080Ti GPUs. (显存11GB)

Ablation Study

在这里插入图片描述

Comparison with State-of-the-Arts

1) Visual Quality Comparison:

在这里插入图片描述

2) No-Referenced Image Quality Assessment:

3) Human Subjective Evaluation:

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JrSUbqed-1617030905312)(C:\Users\62574\AppData\Roaming\Typora\typora-user-images\image-20210301110133687.png)]$

Adaptation on Real-World Images

Domain adaptation is an indispensable factor for real-world generalizable image enhancement. The unpaired training strategy of EnlightenGAN allows us to directly learn to enhance real-world low-light images from various domains, where there is no paired normal-light training data or even no normallight data from the same domain available.

Those low-light images suffer from severe artifacts and high ISO noise. We then compare two EnlightenGAN versions trained on different normal-light image sets, including:

the pre-trained EnlightenGAN model as described in Sec. IV-A, without any adaptation for BBD-100k;
EnlightenGAN-N: a domain-adapted version of EnlightenGAN, which uses BBD-100k low-light images from the BBD-100k dataset for training, while the normal-light images are still the high-quality ones from our unpaired dataset in Sec. IV-A.

We also include a traditional method, Adaptive histogram equalization (AHE), and a pre-trained LIME model for comparison, and an unsupervised approach CycleGAN.

效果如下：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8fFCHsq2-1617030905314)(C:\Users\62574\AppData\Roaming\Typora\typora-user-images\image-20210301110656674.png)]$

LIME： suffer from severe noise amplification and over-exposure artifacts

AHE： does not enhance the brightness enough

CycleGAN： generate very low quality due to its unstability

EnlightenGAN： also leads to noticeable artifacts on this unseen image domain

EnlightenGAN-N： produces the most visually pleasing results, striking an impressive balance between brightness and artifact/noise suppression

Pre-Processing for Improving Classification

testing set of extremely dark (ExDark) dataset, applying our pretrained EnlightenGAN as a pre-processing step, followed by passing through another ImageNet-pretrained ResNet-50 classifier. The high-level task performance serves as a fixed semantic-aware metric for enhancement results.

In the low-light testing set, using EnlightenGAN as preprocessing improves the classification accuracy from 22.02% (top-1) and 39.46% (top-5), to 23.94% (top-1) and 40.92% (top-5) after enhancement. That supplies a side evidence that EnlightenGAN preserves semantic details, in addition to producing visually pleasing results. We also conduct experiment using LIME and AHE. LIME improves the accuracy to 23.32% (top-1) and 40.60% (top-5), while AHE obtains to 23.04% (top-1) and 40.37% (top-5).