Camera 的曝光校正概述

最新推荐文章于 2024-08-17 17:29:19 发布

yaoming168

最新推荐文章于 2024-08-17 17:29:19 发布

阅读量2.5k

点赞数 2

分类专栏： Camera Hal 文章标签：算法

本文链接：https://blog.csdn.net/yaoming168/article/details/120100111

版权

Camera Hal 专栏收录该内容

8 篇文章 54 订阅

订阅专栏

1.1 简介

影响图像质量，导致画面不悦的主要问题之一，来自于不当暴露在光线下。除了当今相机中包含的复杂功能（即自动增益
控制算法），故障不太可能发生。数字消费设备利用特别策略和试探法来导出曝光设置参数。通常这些技术是完全盲目的
关于所涉及场景的具体内容。有些技术是完全自动的，案例在点由基于平均/自动曝光测光或更复杂的那些代表矩阵/智能测光。其他人再次给予摄影师对曝光选择的一定控制权，从而为个人品味留出空间或使他能够满足特定需求。
尽管调节暴露的方法有很多种，而且其中一些方法很复杂，但以非最佳或不正确的曝光获取图像的情况并不少见。这尤其适用于
手机设备（例如移动电话）有几个因素会导致获得曝光不良的照片：差光学，没有闪光灯，更不用说困难的输入场景照明条件，等等。

对于什么是正确的曝光并没有确切的定义。可以抽象概括并定义最佳曝光，使人们能够重现最重要的区域（根据
到上下文或感知标准）具有灰度或亮度，或多或少处于中间可能的范围。在任何情况下，如果场景的动态范围明显“高”，则无法获取整体涉及的细节。本章的主要问题之一是致力于提供有效的概述涉及的技术细节：

成像设备在采集阶段（即预捕捉阶段）之前的曝光设置 [1];
作为后处理应用的依赖于内容的增强策略 [2];
同一场景不同曝光时间多画面采集的高级解决方案允许重现现实世界的辐射图 [3]。

本章的其余部分组织如下。第一节将详细讨论传统和与预捕获阶段相关的高级方法(即连续读取传感器和输出
分析以确定一组与最终图像质量严格相关的参数[1])。曝光设置的作用将被分析，并考虑一些案例研究，通过利用
对真实场景的动态范围进行一些假设，就有可能得出有效的策略。第3节将描述[2]中的工作，其中使用后处理技术是有效的
通过分析图像的一些与内容相关的特征，得到了增强效果。

最后，在成像设备的曝光校正：概述第 4 节中，将简要回顾致力于通过使用多图像采集（即包围）来提高采集能力的先进方法。特别是，将详细介绍在计算出可靠的 HDR（高动态范围）[3] 后能够有效再现真实场景的显着部分的主要技术。

1.2 测光技术

随着计算机技术的引入，相机内置的测光技术变得越来越好，但仍然存在局限性。例如在雪景上拍照或试图在不覆盖相机计算测光的情况下拍摄黑色机车是非常困难的。曝光持续时间最重要的方面是确保获取的图像落在传感器灵敏度范围的良好区域内。在许多设备中，选定的曝光值是调整消费者将看到的整体图像强度的主要处理步骤。许多最早的数码相机使用单独的测光系统来设置曝光持续时间，而不是使用从传感器芯片获取的数据。整合将曝光测光功能集成到主传感器（通过镜头，或 TTL，测光）可以降低系统成本。成像社区使用一种称为曝光值 (EV) 的度量来指定 f 值 1、F 和曝光持续时间 T(1.1) 之间的关系：
$log_2(\frac{F^2}{T}) = 2log_2(F)-log_2(T)$
暴露值(1.1)随着暴露时间的增加而减小，随着暴露时间的延长而增大焦距比数生长。大多数自动曝光算法都是这样工作的：

使用预先确定的曝光值（ $EV_{pre}$ ）拍摄一张照片;
将 RGB 值转换为亮度，B；
导出单个值 $B_{pre}$ （如中心加权平均值、中位数或更复杂的加权方法如矩阵测光）来自亮度图片；
基于线性假设和等式（1.1），最佳曝光值 $EV_{opt}$ 应为允许正确曝光的一种。在这个 $EV_{opt}$ 拍摄的照片应该给出一个接近于预定义的理想值 $B_{opt}$ ，因此（1.2）：
$EV_{opt}=EV_{pre}+log_2(B_{pre})-log_2(B_{opt})$
每个算法的理想值 $B_{opt}$ 通常是根据经验选择的。不同的算法主要有不同单个数 $B_{pre}$ 是如何从图中推导出来的。
注：f 值或光圈值是光线穿过镜头后部的孔大小的测量值，相对于焦距。 f 值越小，通过镜头的光线就越多。

1.2.1 经典方法

数码相机的测光系统可以测量场景中的光量，并计算出最佳匹配度曝光值基于下面解释的测光模式。自动曝光是标准功能
在所有的数码相机里。选择测光模式后，只需指向相机即可然后按下快门。测光方法定义了使用场景的哪些信息计算暴露值和如何确定。相机通常允许用户进行选择点、中心加权平均或多区域计量模式。

点测光

测光允许用户在帧的中心测量被摄对象(或在某些相机上选择自动对焦(AF)点)。仅占整个取景框的一小部分(取景器面积的1-5%)
被度量，而帧的其余部分被忽略。在这种情况下，Bpre(1.2)是中心区域的平均值(参见图1.1 (a))。这将是典型的有效的中心的场景，但一些相机允许用户选择一个不同的偏心点，或在测光后通过移动相机重新构图。几模型支持多点模式，允许多点仪表读数的场景是平均的。这些相机和其他相机也支持高光和阴影区域的测光。点测光非常准确，不受画面中其他区域的影响。它通常用于拍摄非常高对比度的场景。例如（见图 1.1(b)），如果受试者的背部被上升的太阳和脸部比主体背部和发际线周围的明亮光晕暗得多（主体是“背光”），点测光允许摄影师测量从拍摄对象脸上反射的光线，并为此适当曝光，而不是发际线周围更亮的光线。背部和周围的区域发际线会过度暴露。点测光是区域系统所依赖的一种方法2。

局部区域测光

这种模式比点测光更大的区域（大约整个画面的 10-15%），通常是当画面边缘非常亮或非常暗的区域会影响测光时使用过分。像点测光一样，一些相机可以使用可变点来获取读数（通常是自动对焦点），或者在取景器的中心有一个固定点。在图 1.1(d) 中，部分计量的示例在背光场景中显示；这种方法允许均衡更多的全局曝光。

注：Zone System 是一种用于确定最佳胶片曝光和显影的摄影技术，由 Ansel 制定Adams 和 Fred Archer 于 1941 年。Zone System 为摄影师提供了一种精确定义区域的系统方法。他们将摄影主题可视化的方式与最终结果之间的关系。虽然它起源于黑色和白片胶卷，Zone System 也适用于胶卷，黑白和彩色，负片和反转片，以及数码摄影。
在这里插入图片描述

中央重点平均测光

这种方法可能是几乎所有数码相机中最常见的测光方法：
它也是那些不提供测光模式选择的数码相机的默认设置。在这个系统中，作为如图 1.1(e) 所示，仪表将 60% 到 80% 的灵敏度集中在取景器的中央部分。然后天平被“羽化”到边缘。有些相机允许用户将中心部分的重量/平衡调整到外围部分。这样做的好处之一方法是受取景器边缘亮度变化较大的小区域影响较小；
由于许多主体位于框架的中心部分，因此可以获得一致的结果。不幸的是，如果一个背光出现在场景中，中央部分比场景的其余部分更暗（图 1.1（f）），并产生令人不快的曝光不足的前景。

平均测光

在这个模式下，相机将使用来自整个场景的光信息和最终的平均值曝光设置，使计量区域的任何特定部分不加重。这种计量技术
已被中央加重测光所取代，因此只有在旧相机中才过时。

1.2.2 高级方法

矩阵或多区域测光

此模式也称为矩阵、评估、蜂窝、分段测光或 esp（电选择模式）在某些相机上测光。它首先由尼康 FA 推出，在那里它被称为自动多模式测光。在许多相机上，这是默认/标准测光设置。相机测量场景中几个点的光强度，然后结合结果找到最佳曝光。它们的组合/计算方式因相机而异。实际人数使用的区域变化很大，从几个到一千多个。然而，性能不应该被总结仅在区域数量或布局上。如图 1.2 所示，布局可以从一个制造商到另一家，也在同一家公司内使用不同的多区计量可以改变由于多种原因（例如，最终像素矩阵的维度）。

许多制造商对用于确定暴露的确切计算并不公开。考虑了许多因素，其中包括：自动对焦点、与拍摄对象的距离、对焦或失焦的区域、场景的颜色/色调和背光。多区域倾向于偏向其曝光朝向正在使用的自动对焦点（同时也考虑到框架的其他区域），从而确保兴趣点已被适当曝光（这也是为了避免需要使用曝光大多数情况下补偿）。相机中预先存储了数千张曝光的数据库，

在这里插入图片描述

Figure 1.2: 多个相机制造商使用的不同类型的多区域测光模式的例子。

在这里插入图片描述

Figure 1.3: Bayer数据子采样生成。

并且处理器可以使用选择性模式来确定正在拍摄的内容。一些相机允许用户链接（或取消链接）自动对焦和测光，从而可以锁定一次曝光实现AF确认，AEL（自动曝光锁定）。使用手动对焦，以及许多紧凑型相机，自动对焦点不用作曝光计算的一部分，在这种情况下，它很常见测光默认为取景器中的中心点，使用基于该区域的模式。一些用户在高对比度下进行广角拍摄时会遇到问题，因为区域很大，变化很大在亮度方面，重要的是要了解即使在这种情况下，焦点对整体曝光。

1.3 曝光校正内容相关

如第 1.2 节所述，可以定义能够重现最重要的最佳曝光具有灰度或亮度的区域（根据上下文或感知标准），或多或少
可能范围的中间。在采集阶段之后，典型的后处理技术尝试实现通过使用全局方法的有效增强：直方图规范、直方图均衡化和
用于改善全局对比度外观的伽马校正 [4] 仅拉伸强度的全局分布。需要更多的自适应标准来克服这种缺点。曝光校正技术 [2]
本节中描述的主要是为移动传感器应用而设计的。这个新元素，现在在最新的移动设备中，当用户将移动设备用于
视频通话。在捕获的图像中检测皮肤特征允许选择和适当地增强和/或跟踪感兴趣的区域（例如，面部）。如果场景中不存在皮肤，则算法自动切换到视觉相关区域的其他功能（例如对比度和焦点）跟踪。这种实现与[5]中描述的算法不同，因为整个处理过程也可以直接在拜耳模式图像上执行 [6]，并使用更简单的统计方法来识别信息承载区。该算法定义如下：

亮度提取。如果算法应用于Bayer数据，代替三全彩平面，使用子采样（四分之一大小）近似输入数据（图 1.3）。
该算法使用合适的特征提取技术为每个区域固定一个值。这个操作允许寻找视觉相关的区域（对于对比度和聚焦，区域是基于块的，对于皮肤识别区域与每个像素相关联）。
一旦识别出“视觉上重要”的像素（例如，属于皮肤特征的像素），一个全局的使用色调校正技术作为主要参数应用相关的平均灰度级地区。

1.3.1 特征提取：对比度和焦点

在这里插入图片描述

Figure 1.4:特征提取管道（用于 N = 25 的焦点和对比度）。每个的视觉相关性输入图像 (a) 的亮度块 (b) 基于相关性度量 © 能够获得相关列表块（d）。

为了能够识别包含更多信息的图像区域，亮度平面被细分为 N 个相同尺寸的块（在我们的实验中，我们对 VGA 图像使用了 N=64）。为了每个块，计算“对比度”和“焦点”的统计度量。因此，假设很好与其他块相比，聚焦或高对比度块更相关。对比度是指范围图像中存在的色调。高对比度会导致更多的感知重要区域块内。

焦点表征块的锐度或边缘，可用于识别区域存在高频分量（即细节）。如果简单地计算上述措施在高度曝光不足的图像上，具有更好曝光的区域将始终具有更高的对比度和边缘与那些被遮挡的相比。为了进行可视化分析，揭示最无论照明条件如何，重要的特征，一个新的“可见性图像”是通过推动输入绿色拜耳图案平面（或彩色图像的 Y 通道）的平均灰度级为 128。
使用与用于调整曝光级别相同的功能执行推操作，它将被稍后描述。通过简单地为每个块构建直方图来计算对比度度量，然后
计算其与平均值 (1.5) 的偏差 (1.4)。高偏差值表示良好的对比度和反之亦然。为了去除不相关的峰值，通过替换每个条目来稍微平滑直方图其均值在射线 2 邻域中。因此，原始直方图条目被替换为灰度级 $I ˜ [i] :$
在这里插入图片描述

直方图偏差 D 计算为 :
在这里插入图片描述

其中 M 是平均值 :
在这里插入图片描述

通过使用简单的 3x3 拉普拉斯滤波器对每个块进行卷积来计算焦点度量。为了丢弃不相关的高频像素（主要是噪声），卷积的输出在每个像素被阈值化。每个块的平均焦点值计算如下：

其中 N 是像素数，thresh() 运算符丢弃低于固定阈值的值噪音。一旦为所有块计算了值 F 和 D，相关区域将使用线性分类
两个值的组合。特征提取流水线如图 1.4 所示.

1.3.2 特征提取：皮肤识别

和以前一样，获得的“可见性图像”迫使亮度通道的平均灰度级约为建成128座。大多数现有的肤色检测方法通常对某种程度的测量进行阈值每个像素的肤色的可能性并独立处理它们。人类的肤色形成一个颜色的特殊类别，区别于大多数其他自然物体的颜色。它一直发现人类肤色聚集在各种颜色空间中（[7]，[8]）。皮肤颜色变化人与人之间多半是由于强度的不同。因此，可以通过使用来减少这些变化仅色度分量。杨等人。 [9] 已经证明了人类肤色的分布可以用色度平面上的二维高斯函数表示。这个中心
分布由均值向量 ~µ 决定，其形状由协方差矩阵 Σ 决定；
这两个值都可以从适当的训练数据集中估计。 a 的条件概率 p (~x|s)属于肤色类的块，给定它的色度向量 ~x 然后表示为：

在这里插入图片描述
图 1.5：RGB 图像上的皮肤识别示例：(a) 以 JPEG 格式压缩的原始图像；(b) 最简单的阈值法输出； © 概率阈值输出。

在这里插入图片描述图 1.6：Bayer 模式图像上的皮肤识别示例：（a）Bayer 数据中的原始图像； (b) 用概率方法识别皮肤； © r - g 双向直方图上的阈值皮肤值（皮肤位点）。

在这里插入图片描述

其中 d(~x) 是从向量 ~x 到平均向量 ~µ 的所谓马氏距离，定义为：
在这里插入图片描述

值d(~ x)决定了给定块属于肤色类的概率。更大的距离d(~x)，该块属于肤色类s的概率越低通过实验得到了在不同条件和分辨率下获取的大数据集的图像使用CMOS-VGA传感器在“STV6500 - E01”评估试剂盒配备“502 VGA传感器”[10]。由于以大量的颜色空间、距离测量、二维分布、多皮肤可以使用识别算法。肤色算法独立于曝光校正，因此我们介绍了两种不同的用于识别皮肤区域的替代技术(如图1.5所示):

通过使用输入的 YCbCr 图像和条件概率（1.7），每个像素被分类为是否属于皮肤区域。然后导出具有归一化灰度值的新图像，正确突出显示皮肤区域的地方（图 1.5©）。灰度值越高越大计算可靠识别的概率。
通过处理输入的 RGB 图像，计算出 2D 色度分布直方图 (r, g)，其中 r=R/(R+G+B) 和 g=G/(R+G+B)。代表皮肤的色度值聚集在一个(r,g) 平面的特定区域，称为“皮肤轨迹”（图 1.6.c），如 [11] 中所定义。像素具有将选择属于皮肤轨迹的色度值来校正曝光。对于 Bayer 数据，皮肤识别算法对通过二次采样创建的 RGB 图像起作用。原始图片，如图 1.3 所示。

1.3.3 曝光校正

一旦识别出视觉相关区域，就使用平均灰度进行曝光校正这些区域的价值作为参考点。为此使用模拟的相机响应曲线。这个
函数可以用一个简单的参数化闭式表示来表示：
在这里插入图片描述

其中 q 代表“光”量，最终像素值通过参数获得A、C 用于控制曲线的形状。 q 应该用基于 2 的对数表示单位（通常称为“停止”）。可以估计这些参数，具体取决于特定图像采集设备或通过实验选择，如下面更好地指定（参见第 1.4 节）。偏移量来自
使用 f 曲线和视觉相关区域 avg 的平均灰度级计算理想曝光，作为：
其中 Trg 是所需的目标灰度级。 Trg 应该在 128 左右，但它的值可能会略有下降特别是在处理经常应用一些后处理的拜耳模式数据时。像素 (x,y) 的亮度值 Y(x,y) 修改如下：
请注意，所有像素均已校正。基本上上一步是作为 LUT（查找表）实现的转变。

1.3.4 曝光校正结果

所描述的技术已经使用以不同分辨率获取的大型图像数据库进行了测试，使用不同的采集设备，包括 Bayer 和 RGB 格式。在拜耳的情况下，算法是插入实时框架，使用“STV6500 - E01”评估套件上的 CMOS-VGA 传感器带有“502 VGA 传感器”[10]。使用实时处理进行皮肤检测的示例报告在图 1.7。在 RGB 情况下，该算法可以作为后处理步骤来实现。皮肤示例和对比度/焦点曝光校正分别如图 1.8 和图 1.9 所示。结果显示所提出算法的特征分析能力如何允许对比度增强考虑考虑到输入图像的一些强大的特性。主要细节和实验可以在[2]中找到。

1.4 包围和高级应用

为了尝试恢复或增强曝光严重的图像，即使进行了某种后处理

是可能的，有些情况下此策略不可行或导致结果不佳。问题

来自这样一个事实，即可以增强捕获不当的数据，但如果根本不存在数据，则没有什么可

提高。今天，尽管数码摄影取得了巨大的进步，

即使对于面向大众市场的产品也有巨大的分辨率，几乎所有的数码相机仍然交易

有限的动态范围和不充分的数据表示，这使得关键的照明情况，和

现实世界有很多，难以处理。

在这里插入图片描述

Figure 1.7: Exposure Correction results by real-time and post processing: (a) Original Bayer input image;
(b) Bayer Skin detected in real-time; © Color Interpolated Image from Bayer Input; (d) RGB Skin detected
in post processing; (e) Exposure Corrected Image obtained from RGB image.

在这里插入图片描述 Figure 1.8: Exposure Correction results by post processing: (a) Original Color input image; (b) Contrast
and focus visually significant blocks detected; © Exposure Corrected Image obtained from RGB image.

在这里插入图片描述
Figure 1.9: Exposure Correction results: in the first row the original images (a) and (b) acquired by Nokia
7650 VGA sensor compressed in Jpeg format, © picture acquired with CCD Sensor (4,1 Mega Pixels)
Olympus E-10 Camera; in the second row the Corrected output.

这就是多重曝光捕获作为超越实际技术限制的有用替代方案的地方。即使组合多个公开数据的想法最近才受到极大关注，但该方法本身已经很老了。早在六十年代初期数字图像处理的出现 Charles Wyckoff [12] 能够捕捉高动态范围的图像通过使用不同感光度的感光乳剂层。来自每个人的信息使用不同颜色将图层打印在纸上，从而获得伪彩色图像描述。

1.4.1 传感器与世界

Table 1.1: Typical world luminance levels.
在这里插入图片描述

Dynamic range refers to the ratio of the highest and lowest sensed level of light. For example, a scene
where the quantity of light ranges from 1000 cd/m2
to 0.01 cd/m2
, has a dynamic range of
1000/0.01=100,000. The simultaneous presence in real world scenes poses great challenges on image capturing devices, where usually the available dynamic range is not capable to copt with that coming from
the outside world. High dynamic range scenes are not uncommon; imagine a room with a sunlit window,
environments presenting opaque and specular objects and so on. Table 1.1 shows typical luminance values
for different scenes, spanning a very wide range from starlight to sunlight. On the other side dynamic range
(DR) of a digital still camera (DSC) is defined as the ratio between the maximum charge that the sensor can
collect (full well capacity, FWC), and the minimum charge that is just above sensor noise (noise floor,NF).
This quantity is usually expressed in logarithmic units.

This dynamic range, which is seldom in the same order of magnitude of those coming from real world
scenes, is further affected by errors coming from analogue to digital conversion (ADC) of sensed light
values. Once the light values are captured, they are properly quantized to produce digital codes, that usually
for common 8-bit data fall in the [0 : 255] range. This means that a sampled, coarse representation of the
continuously varying light values is produced.

Limited dynamic range and quantization thus irremediably lead to loss of information and to inadequate
data representation. This process is synthetically shown in Fig. 1.10, where the dynamic range of a scene

在这里插入图片描述 Figure 1.10: Due to limited camera dynamic range, only a portion, depending of exposure settings, of the
scene can be captured and digitized.

is converted to the digital data of a DSC: only part of the original range is captured, the remaining part is
lost. The portion of the dynamic range where the loss occurs depends to employed exposure settings. Low
exposure settings, by preventing information loss due to saturation of highlights, allow to capture highlight
values, but lower values will be easily overridden by sensor noise. On the other side, high exposures settings
allow a good representation of low light values, but the higher portion of the scene will be saturated. Once
again a graphical representation gives a good explanation of the different scenarios.

Fig. 1.10 shows a high exposure capture. Only the portion of the scene under the green area is sensed
with a very fine quantization (for simplicity only 8 quantization levels, shown with dotted lines, are supposed), the other portion of the scene is lost due to saturation which happens at the luminance level corresponding to the end of the green area.

在这里插入图片描述 Figure 1.11: Information loss for high exposure. A limited dynamic range is captured due to saturation. The
captured data is finely quantized.

Fig. 1.12 shows a low exposure capture. This time since saturation, which happens at the light level
corresponding to the end of the red area, is less severe due to low exposure settings and apparently all the
scene is captured (the red area). Unfortunately, due to very widely spanned sampling intervals, quality of
captured data is damaged by quantization noise and errors.

To bring together data captured by different exposure settings allows to cover a wider range, and reveal

在这里插入图片描述
more details than those that would have been possible by a single shot. The process is usually conveyed by
different steps:

camera response function estimation;
high dynamic range construction;
tone mapping to display or print medium.

1.4.2 相机响应函数

In order to properly compose a high dynamic range image, using information coming from multiple low
dynamic range (LDR) images, the camera response function must be known. This function describes the
way the camera reacts to changes in exposures, thus providing digital measurements.
Camera exposure X, which is the quantity of light accumulated by the sensor in a given time, can be
defined as follows:

where I is the irradiance and t the integration time.
When a pixel value Z is produced, it is known that it comes from some scene radiance I sensed for
a given time t, mapped into the digital domain through some function f. Even if most CCD and CMOS
sensors are designed to produce electric charges that are strictly proportional to the incoming amount of
light (up to the near saturation point, where values are likely to fluctuate), the final mapping is seldom
linear. Nonlinearities can come from the ADC stage, sensor noise, gamma mapping and specific processing
introduced by the manufacturer. In fact often DSC camera have a built-in nonlinear mapping to mimic a
film-like response, which usually produces more appealing images when viewed on low dynamic displays.
The full pipeline, from the scene to the final pixel values is shown in Fig. 1.13 where prominent nonlinearities can be introduced in the final, generally unknown, processing.
在这里插入图片描述
Figure 1.13: The full pipeline from scene to final digital image. The main problem behind assembling the
high dynamic range from multiple exposures, lies in recovering the function synthesizing the full process.

The most obvious solution to estimate the camera response function, is to use a picture of uniformly lit
different patches, such as the Macbet Chart [13] and establish the relationship between known light values
and recorded digital pixel codes. However this process requires expensive and controlled environment and
equipment. This is why several chartless techniques have been investigated. One of the most flexible
algorithms have been described in [14], which only requires an estimation of exposure ratios between the
input images. Of course exposure ratios are at hand given the exposure times, as produced by almost all
photo-cameras. Given N digitized LDR pictures. representing the same scene and acquired with timings
tj : j = 1, …, N, exposure ratios Rj,j+1 can be easily described as

Thus the following equation relates the i’th pixel of the j’th image, Zij , to the underlying unknown radiance
value Ii

which is the aforementioned camera response function. The principle of high dynamic range compositing is
the estimation for each pixel, of the radiance values behind it, in order to obtain a better and more faithful
description of the scene that has originated the images. This means that we are interested in finding the
inverse of Eq. 1.14: a mapping from pixel value to radiance value is needed!

The nature of the function g is unknown, the only assumption is that it must be monotonically increasing.
That’s why a polynomial function of order K is supposed.

The problem thus becomes the estimation of the order K and the coefficients ck appearing in Eq. 1.17. If
the ratios between successive image pairs (j, j + 1) are known, the following relation holds:

Using Eq. 1.18, parameters are estimated by minimizing the following objective function:

where N is the number of images and P the number of pixels. The system can be easily solved by using
the least square method. The condition g(1) = 1 is enforced to fix the scale of the solution, and different K
orders are tested. The K value that better minimizes the system is retained.
To limit the number of equations to be considered not all pixels of the images should be used and some
kind of selection is advised by respecting the following rules:

pixels should be well spatially distributed;
pixels should sample the input range;
pixels should be picked from low variance (homogenous) areas.

A different approach for feeding the linear system in (1.19) could be done by replacing pixel values
correspondences by comparagram pairs. Comparagrams have been well described in [15] and provide an
easy way to represent how pixels of one image are mapped to the same image with different exposure. This
mapping is usually called brightness transfer function (BTF).
It’s worth noting that if direct access to raw data is available, and known to be linear, the response curve
estimation step could be avoided, since in this case the function equals a simple straight line normalized in
the range [0, …, 1]. Fig. 1.14 shows 10 images captured at different exposure settings, form 1
1600 sec to 14
sec, while Figure 1.15 shows the recovered response curve on both linear (left) and logarithmic units.

1.4.3 高动态范围图像构建

Once the response function, estimated or at priori known, is at hand the high dynamic range image, usually
referred as radiance map and composed of floating point values having greater range and tonal resolution
than usual low dynamic range (LDR) data, can be assembled. The principle is that each pixel in each image,
provides a more or less accurate estimation of the radiance value of the scene in the specific position. For
example, very low pixel values coming from low exposure images are usually noisy, and thus not reliable,
but the same pixels are likely to be well exposed in images acquired with higher exposure settings.
Given N images, with exposure ratios ei : i = 1 : N and considering Eq. 1.16 the sequence
h g(Zi,1) t1 , g(Zi,2) t2
, …,
g(Zi,N ) tN i
of estimates for a pixel in position i is obtained. Different estimates should be
assembled by means of a weighted average taking into account reliability of the pixel itself. Of course, the
weight should completely discard pixels that appear as saturated and assign very low weight to pixels whose
value is below some noise floor, since they are unable to provide decent estimation.

One possible weighting function could be a hat or gaussian shaped function centered around mid-gray
pixel values, which are far from noise and saturation. As a thumb of rule, for each pixel there should be
at least one image providing an useful pixel (e.g. that is not saturated, nor excessively noisy). Given the
weighting function w(Z) the radiance estimate for a given position i is given by:

在这里插入图片描述

在这里插入图片描述
Figure 1.14: A sequence of 10 images, captured at iso 50, f-6.3, and exposures ranging from 1
1600 to 14
sec.

1.4.4 场景与显示媒介

Once the high dynamic range image has been assembled, what’s usually required is a final rendering on the
display medium, such as a CRT display or a printer. The human eye is capable of seeing a huge range of
luminance intensities, thanks to its capability to adapt to different values. Unfortunately this is not the way
most image rendering systems work. Hence they are usually not capable to deal with the full dynamic range
contained into images that provide and approximation of real world scenes. Indeed most CRT display have
an useful dynamic range in the order of nearly 1:100. It’s for sure that in the next future, high dynamic
reproduction devices will be available, but for the moment they are well far from mass market consumers.
在这里插入图片描述 Simply stated, tone mapping is the problem of converting an image containing a large range of numbers,
usually expressed in floating point precision, into a meaningful number of discrete gray levels (usually in
the range 0, …, 255), that can be used by any imaging device. So, we can formulate the topic as that of the
following quantization problem(1.21):

$Q (v a l) = ∣ (N - 1) . F (v a l) + 0.5 ∣$
$F:L_{w_{min}}:L_{w_{max}}->[0:1]$

其中 $L_{w_{min}}:L_{w_{max}}]$ 是输入范围，N 是允许的量化级别数，F 是音调映射函数。简单的线性缩放通常会导致再现图像上的大量信息丢失。图 1.16，显示了通过线性缩放高动态范围图像获得的结果，使用上述技术从图 1.14 的序列构建。可以看出，只有一个场景的一部分是可见的，因此需要更好的 F 替代方案。

Two different categories of tone mapping exist:

Tone Reproduction Curve (TRC): the same function is applied for all pixels;
Tone Reproduction Operator (TRO): the function acts differently depending on the value of a specific
pixel and its neighbors.
In what follows, several of such techniques will be briefly described and applied on the input HDR image,
assembled from the sequence in Figure 1.14. The recorded input was in the range of 0.00011 : 32.

直方图调整 (TRF)

The algorithm described in [16], by G. Ward et al., is based on ideas coming from image enhancement
techniques, specifically histogram equalization. While histogram equalization is usually employed to ex- pixel value
Exposure Correction for Imaging Devices: an Overview
在这里插入图片描述
Figure 1.16: An HDR image built from the sequences of Figure 1.14, linearly scaled in the [0, …, 1] range
and quantized to 8 bits.
pand contrast images, in this case it is adapted to map the high dynamic range of the input image within
that of the display medium, while preserving the sensation of contrast. The process starts by computing a
downsampled version of the image, with a resolution that equals to 1 degree of visual angle. Luminance
values of this, so called fovea image, are then converted in the brightness domain, which can be approximated by computing logarithmic values. For the logarithmically valued image, an histogram is built, where
values between minimum and maximum bounds Lwmin and Lwmax (of the input radiance map) are equally
distributed on the logarithmic scale. Usually employing around 100 histogram bins each having a size of
∆b =
log(Lwmax)−log()Lwmin )
100 provides sufficient resolution. The cumulative distribution function, normalized by the total number of pixels T, is defined as:

where f(bi) is the frequency count for bin i. The derivative of this function can be expressed as

Applying a histogram equalization on the input, the result is an image where all brightness values have equal
probability. The equalization formula, which provides a way to map luminance values to display values, can
be expressed as:

where Ldmin and Ldmax
stay for minimum and maximum display values. This means that the equalized
brightness is fit into the available display dynamic range. Unfortunately naive equalization, tends to overexaggerate contrast in correspondence of highly populated bins (histogram peaks) leading to undesirable

effects. To prevent this, a ceiling procedure is applied on the histogram, imposing that contrast should never
exceed those obtained by a linear mapping. The ceiling can be written in terms of the derivative of the
mapping (which is indicative of contrast):

By putting together Eq. 1.23 and Eq. 1.24 the final histogram constraint is obtained:

在这里插入图片描述

Thus in order to prevent excessive contrast, histogram values are repetitively cut to satisfy (1.26). The
operator has been further refined by the authors to include more sophisticated ceiling procedure, simulation
of color and contrast sensivity, according to some features of the human visual system (HVS). Figure 1.17 (a)
shows the example radiance map, tonemapped to display using Ward’s operator in its basic implementation.
Figure 1.17 (b) plots the resultant mapping.

Chiu’s local operator (TRO)

One of the first works performing spatially varying local processing, for visualization of high dynamic range
image data, was described in [17] whose importance is mainly due to historical reasons. The idea is to apply
at each pixel position a specific scaling, s (x, y). The output image, to be rendered on the medium, is then
obtained by multiplying the input by the scaling function:

双边滤波 (TRO)

Exposure Correction for Imaging Devices: an Overview

The scaling function s (x, y) is defined as

where Lblur (x, y) is a gaussian filtered version of Lw (x, y) and k a user parameter. A value of k = 2 brings
values near the local average (given by the low pass filtered data) to a value of 0.5. This doesn’t leave too
much room for the mapping of brighter pixels, hence a value of k = 8 is suggested. The main problem
with this technique, and which is main concern of all local operators, is the appearance of so called halo
artifacts which easily manifest itself around areas of relevant brightness transition. This is due to the fact
that the local average for pixels around transition zones, carries unwanted ”information” which is not related
to the luminance value of the specific pixel. For example, for a bright pixel near a very dark region, the local
average Lblur (x, y) will be very low due to the influence of dark pixels, leading to a poor scaling of the
bright pixel and thus to the appearance of a bright halo. On the other side, a dark halo is likely to appear
for dark pixels near bright regions, where the scaling will be excessive. Fig. 1.18 shows the result of Chiu’s
algorithm (with k = 8 and a gaussian filtering with a pixel width of 15) where some of the halo artifacts are
highlighted by red squares.

在这里插入图片描述

摄影色调复制

Reinhard et al. [19] have developed an operator based on some simple photographic principles such as
automatic exposure and dodge and burning, where the first provides a global mapping of the image and the

在这里插入图片描述 Figure 1.19: Image mapped with bilateral filtering.

latter exploits some local features. The global part of the operator analyzes the concept of scene’s key, which
is measure of how overall dark or bright the images is. This quantity is approximated with the log average
value Lw of the image luminance values. According to photographic principles, where the key is usually
printed (or displayed) to have the 18% reflectance of the medium, an initial global mapping is performed
using the following equation:
在这里插入图片描述
In this way a kind of automatic exposure setting is provided for the scene, even it is done ex post facto,
since the scene has already been captured by the camera (but, since radiance maps provide a floating point
description of the initial scene, this allow us to do such virtualizations of the photographic process). Even
if in Eq. 1.32, the scene’s key value is linearly mapped to the value of 0.18, different values could be used
depending on the specific image content (eg. a nightlife picture should be scaled to a very low value). No
matter what the dynamic range of the initial scene is, the luminance values exposed by means of Eq. 1.32
are forced to fit inside the medium dynamic range (which is here supposed to vary within [0, …, 1]) using a
compressive function, which is particularly effective on very high luminance values:

在这里插入图片描述
This function scales input values differently, according to their magnitude: small values, usually ¿ 1 are
almost leaved unchanged, while very high values , usually À 1, are scaled by a very large amount (the
quantity Lm(1
x,y)
).
The scaling function, is further refined to include some local processing, similar to dodge and burning
procedures, where a dark value in a bright surround is heavily compressed (burned), and a bright pixel on
a dark surround is only mildly compressed (dodged). To exploit these local properties, filtered versions at
different scales s = 1, …, S of of Lm are produced as

在这里插入图片描述
and in Eq. 1.33 the quantity Lm(x, y) on the denominator, is replaced by

where ◦ equals to the convolution operator, and Rs are different gaussian kernels, having different pixel
widths w(s), for s = 1…S

在这里插入图片描述

wmax and wmin are the maximum and minimum allowed pixel widths, and are fixed respectively to 1 and
43. Thus the smallest scale for a pixel in position (x, y) equals to the pixel itself. To avoid halo artifacts,
for each pixel the largest scale smax : |Vsmax(x, y)| < ², where Vs is the difference between two successive
scales, computed as:

在这里插入图片描述

According to the authors a value of Φ = 8 is used. Practically Eq. 1.37 is the search of the largest surround
across a pixel in position (x, y) whose value is reasonably similar to that of the pixel. This avoids the
appearance of severe halo artifacts, similar to those seen by the application of Chiu’s algorithm. Fig. 1.20
shows the result of the algorithm of Reinhard et. al on our example radiance map, where the parameters
S = 8, Φ = 8 have been used.
在这里插入图片描述

Figure 1.20: Photographic Tone Reproduction mapping.

梯度压缩 (TRO)

The last technique, belonging to the family of TRO, that we are going to describe was developed by Fattal et
al. [20], and it’s far more sophisticated than those that have been described hereof. Even if sometimes output
images can have an unnatural appearance, in most cases results can look very appealing. This algorithm
doesn’t operate directly on the spatial domain, but instead computes the gradient field of the input image
and after manipulating it, reconstructs by means of Poisson integration the image having the new gradients.
This derives from the observation that an image exhibiting an high dynamic range, will be characterized
by gradients of large magnitude around zones of brightness transition. Hence attenuating those gradients
seems like a viable way for building a LDR depiction of the scene, suitable to be viewed on a common
display. Similarly to the pipeline of the algorithm based on bilateral filtering, gradient compression works
on logarithmic data, and so just before producing the output image, the result undergoes exponentiation.
Indicating with l(x, y) the data in the logarithmic domain, the gradient field ∇l(x, y) is computed as follows:
在这里插入图片描述

Attenuation of the gradient field is obtained by multiplication with an proper scaling function Φ(x, y):
在这里插入图片描述

The attenuated gradient field G(x, y) is then inverted by solving the Poisson equation
在这里插入图片描述

由于边缘（以及梯度）存在于多个分辨率级别，因此构建了一个高斯金字塔表示 < l0, l1, …, lS >，并为每个级别计算梯度场。衰减函数然后在每个级别上计算并以自下而上的方式向上级报告。衰减顶层的函数将在（1.39）中有效使用。每一级 s 的衰减函数为计算如下：
在这里插入图片描述

1.41 中的 α 参数决定了哪些梯度幅度保持不变，而 β 指数放大大于 α 的幅度。建议值为 α = 0.1·（平均梯度幅度）和
β = 0.9。由于衰减函数是针对每个分辨率级别 s 计算的，因此传播到全分辨率是通过将衰减函数从级别 s − 1 缩放到 s 并累加这些值来完成的获得将有效使用的全分辨率衰减函数 Φ(x, y)（作者声称通过仅在全分辨率下使用衰减功能，光晕伪影大多不可见）。这可以是由以下等式表示：
在这里插入图片描述

在这里插入图片描述
Figure 1.21: Gradient Compression mapping.

其中 d 是最小分辨率级别，L 是双线性上采样算子。图 1.21 显示在我们的示例 hdr 图像上应用梯度压缩算子的结果。操作员看起来在计算上比其他已描述的更复杂，但可以看出映射在高光和低光可见度方面，图像看起来比之前的渲染更令人印象深刻。

1.5 结论

图像采集的适当曝光设置的问题当然与真实场景的动态范围。在许多情况下，可以通过实施临时计量策略来获得一些有用的见解。或者，可以应用一些色调校正方法来增强图片最显着区域的整体对比度。成像传感器的动态范围有限，无法恢复现实世界的动态；在这种情况下，只能通过使用“括号”并获取具有不同曝光时间的同一场景的几张图片可以实现最终良好的渲染。
在这项工作中，我们简要回顾了自动数字曝光校正方法，尝试报告每个解决方案的具体特点。为了完整起见，我们最近报道了Raskar等人。[21] 提出了一种新颖的策略，专门用于在拍摄期间“颤动”相机的快门打开和关闭。使用二进制伪随机序列选择曝光时间。通过这种方式，可以恢复高频空间细节，尤其是当存在恒速运动时。特别地，仅考虑使问题适定的所谓编码曝光就实现了稳健的解卷积过程。我们认为 Raskar 的技术也可以用于多图像采集，只是为了限制整体重建可靠的 HDR 地图所需的图像数量。