java emd,学术常识—EMD(earth mover distance)距离 | 学步园

Earth mover's distance

Incomputer science, the earth mover's distance (EMD) is a measure

of the distance between two probability distributions over a region D. In mathematics,

this is known as the Wasserstein metric. Informally, if the distributions are interpreted as two different ways of piling

up a certain amount of dirt over the region D, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be amount of dirt moved times the distance by

which it is moved

在计算机科学与技术中,地球移动距离(EMD)是一种在D区域两个概率分布距离的度量,就是被熟知的Wasserstein度量标准。不正式的说,如果两个分布被看作在D区域上两种不同方式堆积一定数量的山堆,那么EMD就是把一堆变成另一堆所需要移动单位小块最小的距离之和。

The above definition is valid only if the two distributions have the same integral (informally, if the two piles have the same amount of dirt), as in normalizedhistograms orprobability

density functions. In that case, the EMD is equivalent to the 1st Mallows distance or 1st Wasserstein distance between

the two distributions

上述的定义如果两个分布有着同样的整体(粗浅的说,就像两个堆有着同样的数量),在规范化的直方图或者概率密度函数上。在这基础上,EMD等同于两个分布的第一Mallows距离或者第一Wasserstein距离。

Extensions

Some applications may require the comparison of distributions with different total masses. One approach is to allow for apartial match, where dirt from the most massive distribution is rearranged to make the least

massive, and any leftover "dirt" is discarded at no cost. Under this approach, the EMD is no longer a true distance between distributions. Another approach is to allow for mass to be created or destroyed, on a global and/or local level, as an alternative to

transportation, but with a cost penalty. In that case one must specify a real parameter σ, the ratio between the cost of creating or destroying one unit of "dirt", and the cost of transporting it by a unit distance. This is equivalent to minimizing the sum

of the earth moving cost plus σ times the L1 distance between the rearranged pile and the second distribution.

一些应用需要比较不同总量的分布。一种方法是允许部分匹配,从最大分布上重新安排一些颗粒去产生最少的量,剩下多余的颗粒就被忽视不需要代价。这样的方法,EMD就不是真正两个分布间的距离。另外的方法允许块产生或销毁,在全局或局部范围,可以选择性的转变,但需要花费代价。那样的花,需要指定实数参数σ,这个σ表示产生或销毁一个单位一个距离颗粒所需要的花费。这就等同于最小化地球移动距离总和,花费σ倍重新堆和第二个分布的L1距离。

Computing the EMD

If the domainD is discrete, the EMD can be computed by solving an instance transportation

problem, which can be solved by the so-called Hungarian algorithm. In particular, ifD is a one-dimensional

array of "bins" the EMD can be efficiently computed by scanning the array and keeping track of how much dirt needs to be transported between consecutive bins.

如果D域是离散的,那么EMD可以用运输问题的HungarianD是一维的数组格子,你们EMD可以有效的通过扫描数组并记录有多少颗粒需要传送于两个连续格子来计算。

External links

References

2.Elizaveta Levina; Peter Bickel (2001). "The EarthMover’s

Distance is the Mallows Distance: Some Insights from Statistics". Proceedings of ICCV 2001 (Vancouver, Canada): 251–256.

3.C. L. Mallows (1972). "A note on asymptotic joint

normality". Annals of Mathematical Statistics 43 (2): 508–515. doi:10.1214/aoms/1177692631.

4.^ a b S.

Peleg; M. Werman, and H. Rom (1989). "A unified approach to the change of resolution: Space and gray-level". IEEE Transactions on Pattern Analysis and Machine Intelligence 11: 739–742.doi:10.1109/34.192468.

5."Mémoire sur la théorie des déblais et des remblais". Histoire

de l’Académie Royale des Science, Année 1781, avec les Mémoires de Mathématique et de Physique.1781.

6.J. Stolfi, personal communication to L. J. Guibas,

1994

7.Yossi Rubner; Carlo Tomasi, Leonidas J. Guibas

(1998). "A Metric for Distributions with Applications to Image Databases". Proceedings ICCV 1998: 59–66.

————罗方炜译

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值