前置知识:Computational optimal transport
Kantorovich’s optimal transport problem
L C ( a , b ) = def. min P ∈ U ( a , b ) ⟨ C , P ⟩ \mathrm{L}_{\mathbf{C}}(\mathbf{a}, \mathbf{b}) \stackrel{\text { def. }}{=} \min _{\mathbf{P} \in \mathbf{U}(\mathbf{a}, \mathbf{b})}\langle\mathbf{C}, \mathbf{P}\rangle LC(a,b)= def. P∈U(a,b)min⟨C,P⟩
U ( a , b ) = def. { P ∈ R + n × m : P 1 m = a and P T 1 n = b } \mathbf{U}(\mathbf{a}, \mathbf{b}) \stackrel{\text { def. }}{=}\left\{\mathbf{P} \in \mathbb{R}_{+}^{n \times m}: \mathbf{P} \mathbf{1}_m=\mathbf{a} \quad \text { and } \quad \mathbf{P}^{\mathrm{T}} \mathbf{1}_n=\mathbf{b}\right\} U(a,b)= def. {P∈R+n×m:P1m=a and PT1n=b}
对偶
L C ( a , b ) = max ( f , g ) ∈ R ( a , b ) ⟨ f , a ⟩ + ⟨ g , b ⟩ \mathrm{L}_{\mathbf{C}}(\mathbf{a}, \mathbf{b})=\max _{(\mathbf{f}, \mathbf{g}) \in \mathbf{R}(\mathbf{a}, \mathbf{b})}\langle\mathbf{f}, \mathbf{a}\rangle+\langle\mathbf{g}, \mathbf{b}\rangle LC(a,b)=(f,g)∈R(a,b)max⟨f,a⟩+⟨g,b⟩
R ( a , b ) = def. { ( f , g ) ∈ R n × R m : f ⊕ g ≤ C } \mathbf{R}(\mathbf{a}, \mathbf{b}) \stackrel{\text { def. }}{=}\left\{(\mathbf{f}, \mathbf{g}) \in \mathbb{R}^n \times \mathbb{R}^m: \mathbf{f} \oplus \mathbf{g} \leq \mathbf{C}\right\} R(a,b)= def. {(f,g)∈Rn×Rm:f⊕g≤C}
Wasserstein距离
考虑
n
=
m
,
p
≥
1
n=m, p\ge 1
n=m,p≥1
令
C
=
D
p
∈
R
n
×
n
\mathbf{C} = \mathbf{D}^p\in\mathbb{R}^{n\times n}
C=Dp∈Rn×n
其中
D
\mathbf{D}
D是距离,即满足
(1)
D
∈
R
+
n
×
n
\mathbf{D}\in\mathbb{R}_+^{n\times n}
D∈R+n×n是对称的
(2)
D
i
,
j
=
0
\mathbf{D}_{i,j}=0
Di,j=0当且仅当
i
=
j
i=j
i=j
(3)
∀
i
,
j
,
k
,
D
i
,
k
≤
D
i
,
j
+
D
j
,
k
\forall i,j,k, \mathbf{D}_{i,k}\le \mathbf{D}_{i,j}+\mathbf{D}_{j,k}
∀i,j,k,Di,k≤Di,j+Dj,k
则
W
p
(
a
,
b
)
=
def.
L
D
p
(
a
,
b
)
1
/
p
\mathrm{W}_p(\mathbf{a}, \mathbf{b}) \stackrel{\text { def. }}{=} \mathrm{L}_{\mathbf{D}^p}(\mathbf{a}, \mathbf{b})^{1 / p}
Wp(a,b)= def. LDp(a,b)1/p
称为p-Wasserstein distance,
可以证明p-Wasserstein distance也是距离
证明:
再说
对偶
W
p
(
a
,
b
)
=
max
(
f
,
g
)
∈
R
(
a
,
b
)
⟨
f
,
a
⟩
+
⟨
g
,
b
⟩
\mathrm{W}_p(\mathbf{a}, \mathbf{b})=\max _{(\mathbf{f}, \mathbf{g}) \in \mathbf{R}(\mathbf{a}, \mathbf{b})}\langle\mathbf{f}, \mathbf{a}\rangle+\langle\mathbf{g}, \mathbf{b}\rangle
Wp(a,b)=(f,g)∈R(a,b)max⟨f,a⟩+⟨g,b⟩
R
(
a
,
b
)
=
def.
{
(
f
,
g
)
∈
R
n
×
R
n
:
f
⊕
g
≤
D
p
}
\mathbf{R}(\mathbf{a}, \mathbf{b}) \stackrel{\text { def. }}{=}\left\{(\mathbf{f}, \mathbf{g}) \in \mathbb{R}^n \times \mathbb{R}^n: \mathbf{f} \oplus \mathbf{g} \leq \mathbf{D}^p\right\}
R(a,b)= def. {(f,g)∈Rn×Rn:f⊕g≤Dp}
而
f
⊕
g
≤
D
p
⇒
f
i
+
g
i
≤
0
\mathbf{f} \oplus \mathbf{g} \leq \mathbf{D}^p\Rightarrow f_i +g_i\le 0
f⊕g≤Dp⇒fi+gi≤0
因此
⟨
f
,
a
⟩
+
⟨
g
,
b
⟩
=
∑
i
=
1
n
(
f
i
a
i
+
g
i
b
i
)
≤
∑
i
=
1
n
(
f
i
a
i
−
f
i
b
i
)
\langle\mathbf{f}, \mathbf{a}\rangle+\langle\mathbf{g}, \mathbf{b}\rangle=\sum_{i=1}^{n}\left(f_ia_i + g_ib_i\right)\le \sum_{i=1}^n\left(f_ia_i -f_i b_i\right)
⟨f,a⟩+⟨g,b⟩=i=1∑n(fiai+gibi)≤i=1∑n(fiai−fibi)
搬土距离
Earth Mover’s Distance
考虑有两个概率分布
P
r
,
P
θ
P_r,P_\theta
Pr,Pθ
则
EMD
(
P
r
,
P
θ
)
=
inf
γ
∈
Π
∑
x
,
y
∥
x
−
y
∥
γ
(
x
,
y
)
=
inf
γ
∈
Π
E
(
x
,
y
)
∼
γ
∥
x
−
y
∥
\operatorname{EMD}\left(P_r, P_\theta\right)=\inf _{\gamma \in \Pi} \sum_{x, y}\|x-y\| \gamma(x, y)=\inf _{\gamma \in \Pi} \mathbb{E}_{(x, y) \sim \gamma}\|x-y\|
EMD(Pr,Pθ)=γ∈Πinfx,y∑∥x−y∥γ(x,y)=γ∈ΠinfE(x,y)∼γ∥x−y∥
考虑对偶
EMD
(
P
r
,
P
θ
)
=
sup
∥
f
∥
L
≤
1
E
x
∼
P
r
f
(
x
)
−
E
x
∼
P
θ
f
(
x
)
.
\operatorname{EMD}\left(P_r, P_\theta\right)=\sup _{\|f\|_{L \leq 1}} \mathbb{E}_{x \sim P_r} f(x)-\mathbb{E}_{x \sim P_\theta} f(x) .
EMD(Pr,Pθ)=∥f∥L≤1supEx∼Prf(x)−Ex∼Pθf(x).
∥
f
∥
L
≤
1
\|f\|_{L \leq 1}
∥f∥L≤1表示Lipschitz连续,其中Lipschitz常数为
L
L
L
即
∣
f
(
x
)
−
f
(
y
)
∣
≤
L
∥
x
−
y
∥
\left|f\left(\mathbf{x}\right)-f\left(\mathbf{y}\right)\right|\le L\|\mathbf{x}-\mathbf{y}\|
∣f(x)−f(y)∣≤L∥x−y∥
参考:
从Wasserstein距离、对偶理论到WGAN
Wasserstein GAN and the Kantorovich-Rubinstein Duality