Wasserstein Distance
Optimal transport
1. Notations
Consider two probability measures µ and ν defined on measure spaces X and Y . In most applications X and Y are subsets of R d \mathbb{R}^d Rd and µ and ν have density functions which we denote by I 0 I_0 I0 and I 1 I_1 I1, d μ ( x ) = I 0 ( x ) d x d\mu(x)=I_0(x)dx dμ(x)=I0(x)dx and d v ( x ) = I 1 ( x ) d x dv(x) = I_1(x)dx dv(x)=I1(x)dx, (originally representing the height of a pile of soil/sand and the depth of an excavation).
2. Monge’s formulation
Monge’s optimal transportation problem is to find a measurable map f : X → Y that pushes µ onto ν and minimizes the following objective function,
M ( μ , v ) = i n f f ∈ M P ∫ x c ( x , f ( x ) ) d μ ( x ) M(\mu,v)=inf_{f\in MP}\int_xc(x,f(x))d\mu(x) M(μ,v)=inff∈MP∫xc(x,f(x))dμ(x)
Where c : X × Y → R + c:X\times Y\rightarrow \mathbb{R}^+ c:X×Y→R+, is the cost functional, and M P = { f : X → Y ∣ f # μ = v } MP=\{f:X\rightarrow Y|f_\#\mu=v\} MP={
f:X→Y∣f#μ=v} represents the pushforward of measure µ and is characterized as, ∫ f − 1 ( A ) d μ ( x ) = ∫ A d ν ( y ) \int_{f^{-1}(A)} d \mu(x)=\int_{A} d \nu(y) ∫f−1(A)dμ(x)=∫Adν(y) for any measurable A ⊂ Y A\subset Y A⊂Y.
Simply put, the Monge formulation of the problem seeks the best pushforward map that rearranges measure µ into measure ν while minimizing a specific cost function.
Drawback:
- Nonlinear with respect to f(x)
- For certain measures the Monge’s formulation of the optimal transport problem is illposed; in the sense that there is no transport map to rearrange one measure to another. For instance, consider the case where µ is a Dirac mass while ν is not.
3. Kantorovich’s formulation
Kantorovich’s formulation alleviates this problem by finding the optimal transport plan as opposed to the transport map. Kantorovich formulated the transportation problem by optimizing over transportation plans, where a transport plan is a probability measure γ ∈ P ( X × Y ) \gamma\in P(X \times Y) γ∈P(X×Y) with marginals µ and ν. The quantity γ ( A , B ) \gamma(A,B) γ(A,B) tells us how much ‘mass’ in set A is being moved to set B. Let Γ ( μ , v ) \Gamma(\mu,v) Γ(μ,v) be the set of all such plans. Kantorovich’s formulation can then be written as,
K ( μ , ν ) = min γ ∈ Γ ( μ , ν ) ∫ X × Y c ( x , y ) d γ ( x , y ) K(\mu, \nu)=\min _{\gamma \in \Gamma(\mu, \nu)} \int_{X \times Y} c(x, y) d \gamma(x, y) K(μ,ν)=γ∈Γ(μ,ν)min∫X×Yc(x,y)dγ(x,y)
Note that unlike the Monge problem, in Kantorovich’s formulation the objective function and the constraints are linear with respect to γ ( x , y ) \gamma (x,y) γ(x,y). Moreover, Kantorovich’s formulation is in the form of a convex optimization problem.
The Kantorovich problem is especially interesting in a discrete setting, that is for probability measures of the form μ = ∑ i = 1 M p i δ x i \mu=\sum_{i=1}^{M} p_{i} \delta_{x_{i}} μ=∑i=1Mpiδxi