搬土距离(The Earth Mover's Distance,EMD)最早由Y. Rubner在1999年的文章《A Metric for Distributions with Applications to Image Databases》中提出,它是归一化的从一个分布变为另一个分布的最小代价,因此可用于表征两个分布之间的距离。
例如,对于图像而言,它可以看做是由色调、饱和度、亮度三个分量组成,每个分量的直方图就是一个分布。不同的图像对应的直方图不同,因此图像之间的距离可以用直方图的距离表征,这时就可以用EMD进行计算。
EMD需要求解运输问题,其运算复杂度较高,平均而言至少是二次方级别。但是它作为距离函数,有一个非常好的特点是存在下界——两个分布的质心之间的距离,因此在粗略计算时,可以考虑用分布质心之间的距离代替EMD。
The Earth Mover's Distance
The Earth Mover's Distance (EMD) is a method to evaluate dissimilarity between two multi-dimensional distributions in some feature space where a distance measure between single features, which we call the ground distance is given. The EMD ``lifts'' this distance from individual features to full distributions.
Intuitively, given two distributions, one can be seen as a mass of earth properly spread in space, the other as a collection of holes in that same space. Then, the EMD measures the least amount of work needed to fill the holes with earth. Here, a unit of work corresponds to transporting a unit of earth by a unit of ground distance.
A distribution can be represented by a set of clusters where each cluster is represented by its mean (or mode), and by the fraction of the distribution that belongs to that cluster. We call such a representation the signature of the distribution. The two signatures can have different sizes, for example, simple distributi