Learning low-level vision学习低层次视觉1999

Learning low-level vision
William T. Freeman and Egon C. Paszto
MERL, a Mitsubishi Electric Res. Lab
201 Broadway, Cambridge, MA 02139
freeman, pasztor@merl.com

学习低层次视觉

Abstract

摘要

We show a learning-based method for low-level vision problems-estimating scenes from images. We generate a synthetic world of scenes and their corresponding rendered images. We model that world with a Markov network, learning the network parameters from the examples. Bayesian belief propagation allows us to efficiently find a local maximum of the posterior probability for the scene, given the image. We cal l this approach VISTA{Vision by Image/Scene Training.

我们证明(展示)了一种基于学习的方法,这种方法可以用来解决低层次视觉中由影像评估场景这一问题。我们生成了一副由场景合成的世界和与之相应的渲染图像。我们用马尔科夫网络(由示例找出网络参数)建立了世界模型。只要给出图像,贝叶斯可信传播就可以有效地找到一个场景后验概率的局部最大值。我们把这种方法成为VISTA方法(影像/场景训练视觉)。

We apply VISTA to the"super-resolution" problem (estimating high frequency details from a low-resolution image), showing good results. For the motion estimation problem, we show figure/ground discrimination, solution of the aperture problem, and filling-in arising from application of the same probabilistic machinery.

我们采用VISTA算法来解决"超分辨率"问题(由低频图像评估高频细节信息),得到了很好的结果。对于运动评估,我们演示了人体/地面辨别力,孔问题的解决和应用相同概率方法所产生的填充。

1 Introduction

1 引言

Weseek machineryfor learning low-level visionproblems, such as motion analysis, inferring shape andalbedo from a photograph, or extrapolating image detail. For these problems, given image data, we want toestimate an underlying scene. The scene quantities to be estimated might be projected object velocities, surface shapes and reflectance patterns, or missing high frequency details.

我们寻找学习低层次视觉问题(比如由相片进行运动分析、推断形态和反射系数或者推测图像的细节)的方法。对于这些问题,在给出图像数据的情况下,我们希望能估计出潜在的场景。要评估的场景数量可能会突出反映对象的速度,表面形状和反射模式,或丢失的高频细节。

Low-level vision problems are typically underconstrained, so Bayesian [3,23,37] and regularization techniques [31] are fundamental. There has been much work and progress (for example,[23,25,15]),but difficulties remain in working with complex, real images. Typically, prior probabilities or constraints are made-up, rather than learned. A general machinery for a learning-based solution to low-level vision problems would have many applications.

层次视觉问题是典型的欠约束的,所以贝叶斯[3,23,37]和[31]的规范化技术就极为重要。科学家已经做了很多工作也取得了一些进展(例如,[23,25,15]),但处理复杂的实时图像时仍有一些困难。通常情况下,先验概率或约束是人工编制的,而非学习得来的。一个基于学习的解决低层次视觉问题的解决方案的通常理论会有很多应用。

A recent research theme has been to learn the statistics of natural images. Researchers have related those statistics to properties of the human visual system [28,2,36], or have used statistical methods with biologically plausible image representations to analyse andsynthesize realistic image textures[14,8,41,36].These methods may help us understand the early stages of representation and processing,but unfortunately,they don’taddress how a visual system might interpret images,i.e.,estimate the underlying scene.

最近的一项研究主题是认识自然图像的统计。研究人员将这些统计数据与人类视觉系统的特性相关联[28,2,36],或使用貌似生物可信的图像表述的统计方法进行分析并合成形象逼真的纹理[14,8,41,36]。这些方法可以帮助我们理解(图像)表述和处理的早期阶段,但不幸的是,这不能解释视觉系统是如何解译图像的,比如潜在场景估计。

We want to combine the two research themes of scene estimation and statistical learning.We study the statistical properties of a synthetically generated,labeled world of images with scenes,to learn how to infer scenes from images. Our prior probabilities can then be rich ones, learned from the training data.Several researchers have applied related learning approaches to low-level vision problems, but restricted themselves to linear models [21,16], too weak for many applications. Our approach is similar in spirit to relaxation labelling [33, 22], but our Bayesia propagation algorithm is more efficient and we utilize large sets of labelled training data.We interpret images by modeling the relationship between local regions of images and scenes, and between neighboring local scene regions. The former allows initial scene estimates; the later allows the estimates to propagate. We train from image/scene pairs and apply the Bayesian machinery of graphical models [29, 5, 20]. We were inspired by the work of Weiss [38], who pointed out the speed advantage of Bayesian methods over conventional relaxation methods for propagating local measurement information.For a related approach, but withheuristically derived propagation rules, see [34].We call our approach VISTA, Vision by Image/Scene TrAining. It is a general machinery that may apply to various problems. We illustrate it for estimating missing image details, and estimating motion.

我们想将场景估计和统计学习两个研究主题结合起来。我们研究合成的,被标记的由场景组成的图像世界的统计特性,以便了解如何推断由图像推断出场景。了解训练数据我们可以得到丰富的先验概率,一些研究人员应用相关学习法来解决低层次视觉问题,但只局限于线性模型[21,16],这对许多应用来说远远不够。我们的做法与松弛标记法[ 33,22 ]在思想上非常相似,但我们的贝叶斯算法更有效,我们利用大集的标记训练数据。我们通过对局部图像和场景之间、相邻局部场景区域之间的关系进行建模来对图像进行解译。前者允许初始的场景估计;后者允许估计传播。我们从图像/场景对中训练并采用贝叶斯理论图形模式[ 29,5,20 ]。魏斯指出了贝叶斯方法相对传统的松弛方法在传播测量信息方面的速度优势,这给了我们启发。我们采用了相关的方法,但利用了启发式传播规律,见[ 34 ]。我们称我们的方法是VISTA方法(图像/场景训练视觉法)。它是一个通用方法,适用于各种问题。我们将演示利用VISTA方法进行丢失的图像细节评估和运动评估。

2 Markov network

2 马尔科夫网络

For given image data, y, we seek to estimate the underlying scene, x (weomit the vector symbols for notational simplicity). We first calculate the posterior probability, P(x|y) = cP(x, y) For this analysis, we ignore the normalization, c = 1/P(y), a constant over x. Under two common loss functions [3], the best scene estimate, ^ x, is the mean (minimum mean squared error, MMSE) or the mode (maximum a posteriori, MAP) of the posterior probability.

对于给定的图像数据,y,我们寻求估计潜在的场景,x(为了符号的简化,我们省略了矢量符号).我们首先计算后验概率,P(x|y) = cP(x,y).这一分析,我们忽略了标准化,c = 1/P(y),一个恒定的X。在两个共同的损失函数下[3],最佳场景估计,^x,是中数(最小均方误差,MMSE)或模式(最大后验,MAP)的后验概率。

In general, ^ x can be difficult to compute [23] without approximations. We make the Markov assumption: we divide both the image and scene intopatches, and assign one node of a Markov network [13, 29,20]to each patch. Given the variables atintervening nodes, two nodes of a Markov network are statistically independent. We connect each scene patch to itscorresponding image patch, and to its nearest neighbors, Fig. 1. Solving a Markov network involves a learningphase, where the parameters of the network connections are learned from training data, and an inference phase, when the scene corresponding to particular image data is estimated.

一般来说,如果不进行近似,^x是难以计算的[23]。我们做马尔可夫假设:图像和场景都进行分,并将每一块与一个马尔可夫网络[13,29,20]节点相关。鉴于变量在中间节点,马尔可夫网络的两个节点是统计独立的。将每一个场景块连接到其相应的图像块和其最近邻节点块,图1.求解马尔可夫网络涉及一个学习阶段,这个阶段的网络连接参数是从训练数据学习到的;还有一个推论阶段,这个阶段会估计与特定图像数据相关的场景。

For networks without loops, the Markov assumption leads to simple “message-passing" rules for computing the MAP and MMSE estimates [29, 39, 20].Writing those estimates for xj bymarginalizing (MMSE) or taking the argmax (MAP) over the other variables gives:

对于无环网络·,马尔可夫假设导致简单的“消息传递”规则计算MAP和MMSE估计[ 29,39,20 ]。通过忽略(MMSE)或以argmax(MAP)替换其他变量【译者注:没找到argmax这个单词,argmax表示寻找具有最大评分的参量】给出以下Xj的估计值:

For a Markov random field, the joint probability over the scenes x and images y can be written as [4,13,12]:

马尔可夫随机场的场景x和图像y的联合概率可以写为[4,13,12]

where we have introduced pairwise compatibility functions, Ψ and Φ, described below. Thefactorizedstructure of Eq. (3) allows theintegralsand argmax operations of Eqs. (1) and (2) to pass through to the compatibility function factors with the appropriate arguments. For a network without loops, the resulting expression can be computed using repeated, local computations [29,39,20], summarized below: the MMSE estimate at node j is

我们已经介绍了成对的兼容性功能,Ψ和Φ,描述如下。方程(3)的分解结构式允许积分寻找方程(1)和(2)的具有最大评分的参量来通过对兼容功能的因素与适当的参数。网络无循环,产生的表达式可以使用重复的,本地的计算方法[29,39,20]来计算,总结如下:节点j的最小均方误差估计是

where kruns over all scene node neighbors of node j.We calculate Lkj from:

k涉及??节点j的所有邻接点。Lkj计算公式如下:

where ~ Llk is Llk from the previous iteration. The initial ~ Llk's are 1. After at most one iteration per xi of Eq. (1), Eq. (4) and (5) give Eq. (1). The MAP estimate equation, Eq. (2), yields analogous formulae,with the integral of Eq. (5) replaced by argmaxxk ,and xjxj of Eq. (4) replaced by argmaxxj . For linear topologies, these propagation rules are equivalent to well-known Bayesian inference methods, such as the Kalman filter and the forward-backward algorithm for Hidden Markov Models [29, 26, 38,20,11].

~Llk由上次的迭代产生。初始的Llk是1。对于公式(1)(4)(5)的xi最多进行一次迭代后,给出公式(1)。用MAP估计方程(方程(2)),产生类似的公式,用argmaxxk取代方程(5)积分,argmaxxj取代式(4)的xjxj。对于线性拓扑结构,这些传播规则与知名的贝叶斯推理方法是相等的,如卡尔曼滤波和隐马尔可夫模型的正倒向算法[29,26,38,20,11]。

Finding the posterior probability distribution for a grid-structured Markov network with loops is computationally expensive and a variety of approximations have been proposed [13,12,20]. Strong empirical results in "Turbo codes" [24,27] and recent theoretical work [39, 40] provide support for a very simple approximation: applying the propagation rules derived above even in a network with loops. Table 1 summarizes results from [40]: (1) for Gaussian processes, the MMSE propagation scheme will converge only to the true posterior means. (2) Even for non-Gaussian processes, if the MAP propagation scheme converges, it finds at least a local maximum of the true posterior probability.

对于一个没有循环的,网格结构的马尔可夫网络,找到后验概率分布的计算代价是很高的,人们已经提出了许多近似的算法[13,12,20]。“Turbo 代码”中强大的实证结果[24,27]最近的理论工作[ 39,40 ]为一个非常简单的近似法提供了支持:前述方法产生的传播规则甚至适用于带有环的网络。表1总结的结果[ 40]产生于:(1高斯过程最小均方误差传播计划将只专注于真正的后验方法。(2)即使是非高斯过程,如果MAP传播计划收敛,它至少发现真实后验概率的一个局部最大值。

2.1 Learning the compatibility functions

2.1 学习兼容性(一致,相容)功能

One can measure the marginal probabilities relating local scenes, xi, and images, yi,as well as neighboring local scenes, xi and xj . Iterated Proportional Fitting (e.g., [18]) is a scheme to iteratively modifythe compatibility functions until the empirically measured marginal statistics agree with those predicted by the model, Eq. (3). For the problems presented here,we found good results by using the marginal statistics measured from the training data, without modications by iterated proportional fitting. Based on a factorization described in [10, 9], for a message from scene nodes j to k,we used (xj ;xk)= P(xj ;xk)P(xk)and(xj ;yj ) = P(yj jxj ). We fit the probabilities with mixtures of Gaussians.

一个既可以衡量邻接的局部场景,xi,xj也可以衡量与局部场景,xi和图像,yi有关的边际概率。比例拟合迭代(例如,[ 18])方案可以迭代修改兼容性功能直到实证测量边际统计与按公式(3)所示模型预测的相一致。从训练数据测量得到的边际统计,没有经过比例拟合迭代修改,利用这些数据来处理这里提出的问题,我们得到了很好的结果。基于[10, 9]所描述的分解,由场景节点j到k的消息,使用 ¥(xj ,xk)=P(xj ,xk)/P(xk)和phei(xj,xk)=P(yj|xj)。我们将概率与混合高斯适应化

An alternate method, which we find gives comparable results, not shown here, is to use scene and image patches with spatially overlap their neighbors. We assume a Gaussian noise penalty on the multiple observations of the same pixels in the overlap region,yielding (xk;xj ) = exp,where dk and dj are the corresponding values of the scenes described at nodes k and j in their region of common support,and σ is a penalty parameter.

这里没有显示的另外一种替代方法有相似的结果,这种方法使用与邻接点重叠的场景和图像块。假设高斯噪声对重叠区的相同像素的多个观测值都有影响,产生¥(xk,xj ) = exp{-(dk-dj)1/$},这里dk和dj是上面所述场景的相关值,即在节点k和j共同支持的区域内,这里的σ是一个惩罚参数。

2.2 Probability Representation

2.2 概率表示

Inspired by the success of [17, 8], we use a sample-based representation for inference. We describe theposterior probability as a set of weights on scenes ob-served in the training set. Given an image to ana-lyze, for eachnodewe collect a set of 10 or 20 \scenecandidates" from the training data which have im-age data closely matching the local observation. Weevaluate the posterior probability only at those scenevalues. The propagation algorithms, Eq. (5) and (4)then are discrete matrix calculations. This simplication focuses the computation on only those sceneswhich render to the observed image data.

3 Super-resolution

For the super-resolution problem, the input imageisalow-resolution image. The scene to be estimatedis a higher resolution image. A good solution to thisproblem would allow pixel-based images to be handledin a relatively resolution-independent manner. Appli-cations could include enlargment of digital or lm pho-tographs, upconversion of video from NTSC format toHDTV, or image compression.

At first, the task may seem impossible|the highresolution data is not there. However, we can seeedges in the low-resolution image that we know shouldremain sharp at the next resolution level. Further-more, based on the successes of recent texture synthe-sis methods [14,8,41,36], we might expect to handletextured areas well, too.

Others [35]have used a Bayesian method, making-up the prior probability. In contrast, the Markovnetwork learns the relationship between sharp andblurred images from large amounts of training data,and achieves better results. Among the non-Bayesianmethods, fractal image representation [32] (Fig. 8c)only gathers training data from the one image, whileselecting the nearest neighbor from training data

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值