《Distributionally Robust Learning》第二章读书笔记

2 The Wasserstein Metric

2.1 Basics

  • Consider the Linear Programming (LP) problem:
    W S , 1 ( P , Q ) = min ⁡ π ∑ i = 1 m ∑ j = 1 n π ( i , j ) s ( i , j )  s.t.  ∑ i = 1 m π ( i , j ) = q j , j ∈ ⟦ n ⟧ ∑ j = 1 n π ( i , j ) = p i , i ∈ ⟦ m ⟧ π ( i , j ) ≥ 0 , ∀ i , j \begin{aligned} W_{\mathbf{S}, 1}(\mathbb{P}, \mathbb{Q})=\min _{\pi} & \sum_{i=1}^{m} \sum_{j=1}^{n} \pi(i, j) s(i, j) \\ \text { s.t. } & \sum_{i=1}^{m} \pi(i, j)=q_{j}, \quad j \in \llbracket n \rrbracket \\ & \sum_{j=1}^{n} \pi(i, j)=p_{i}, \quad i \in \llbracket m \rrbracket \\ & \pi(i, j) \geq 0, \quad \forall i, j \end{aligned} WS,1(P,Q)=πmin s.t. i=1mj=1nπ(i,j)s(i,j)i=1mπ(i,j)=qj,j[[n]]j=1nπ(i,j)=pi,i[[m]]π(i,j)0,i,j
    the objective value is the order-1 Wasserstein distance between distributions P \mathbb{P} P and Q \mathbb{Q} Q
  • Similarly, by defining a cost matrix S t = ( ( s ( i , j ) ) t ) \mathbf{S}^{t}=\left((s(i, j))^{t}\right) St=((s(i,j))t), t), we have the order- t t t Wasserstein distance
    W S , t ( P , Q ) = ( W S t , 1 ( P , Q ) ) 1 / t W_{\mathbf{S}, t}(\mathbb{P}, \mathbb{Q})=\left(W_{\mathbf{S}^{t}, 1}(\mathbb{P}, \mathbb{Q})\right)^{1 / t} WS,t(P,Q)=(WSt,1(P,Q))1/t
  • The above LP formulation is equivalent to the well-known transportation problem (Bertsimas and Tsitsiklis, 1997)
    请添加图片描述

2.2 A Distance Metric

  • In this section we establish that the Wasserstein distance W S , t ( P , Q ) W_{\mathbf{S}, t}(\mathbb{P}, \mathbb{Q}) WS,t(P,Q) is a distance metric, assuming that the underlying cost s ( i , j ) s(i, j) s(i,j) is a proper distance metric.
  • W S , 1 ( P , Q ) W_{\mathbf{S}, 1}(\mathbb{P}, \mathbb{Q}) WS,1(P,Q), viewed as a function of the vectors p p p and q q q corresponding to P \mathbb{P} P and Q \mathbb{Q} Q, is a convex function.

2.3 The Dual Problem

  • The dual of LP in section 2.1
    W S , 1 ( P , Q ) = max ⁡ f , g ∑ i = 1 m g i p i + ∑ j = 1 n f j q j  s.t.  f j + g i ≤ s ( i , j ) , i ∈ ⟦ m ⟧ , j ∈ ⟦ n ⟧ \begin{aligned} W_{\mathbf{S}, 1}(\mathbb{P}, \mathbb{Q})=\max _{\mathbf{f}, \mathbf{g}} & \sum_{i=1}^{m} g_{i} p_{i}+\sum_{j=1}^{n} f_{j} q_{j} \\ \text { s.t. } & f_{j}+g_{i} \leq s(i, j), \quad i \in \llbracket m \rrbracket, j \in \llbracket n \rrbracket \end{aligned} WS,1(P,Q)=f,gmax s.t. i=1mgipi+j=1nfjqjfj+gis(i,j),i[[m]],j[[n]]
  • interpretation

2.3.1 Arbitrary Measures and Kantorovich Duality

  • Primal:
    W s , 1 ( P , Q ) = min ⁡ π ∫ Z 1 × Z 2 s ( z 1 , z 2 ) d π ( z 1 , z 2 ) W_{s, 1}(\mathbb{P}, \mathbb{Q})=\min _{\pi} \int_{\mathcal{Z}_{1} \times \mathcal{Z}_{2}} s\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right) \mathrm{d} \pi\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right) Ws,1(P,Q)=πminZ1×Z2s(z1,z2)dπ(z1,z2)
    W s , t ( P , Q ) = ( W s t , 1 ( P , Q ) ) 1 / t W_{s, t}(\mathbb{P}, \mathbb{Q})=\left(W_{s^{t}, 1}(\mathbb{P}, \mathbb{Q})\right)^{1 / t} Ws,t(P,Q)=(Wst,1(P,Q))1/t
    s t ( z 1 , z 2 ) = ( s ( z 1 , z 2 ) ) t s^{t}\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right)=\left(s\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right)\right)^{t} st(z1,z2)=(s(z1,z2))t
  • Dual:
    W s , 1 ( P , Q ) = sup ⁡ f , g ∫ Z 1 g ( z 1 ) d P ( z 1 ) + ∫ Z 2 f ( z 2 ) d Q ( z 2 )  s.t.  f ( z 2 ) + g ( z 1 ) ≤ s ( z 1 , z 2 ) , z 1 ∈ Z 1 , z 2 ∈ Z 2 , \begin{aligned} W_{s, 1}(\mathbb{P}, \mathbb{Q})=\sup _{f, g} & \int_{\mathcal{Z}_{1}} g\left(\mathbf{z}_{1}\right) d \mathbb{P}\left(\mathbf{z}_{1}\right)+\int_{\mathcal{Z}_{2}} f\left(\mathbf{z}_{2}\right) \mathrm{d} \mathbb{Q}\left(\mathbf{z}_{2}\right) \\ \text { s.t. } & f\left(\mathbf{z}_{2}\right)+g\left(\mathbf{z}_{1}\right) \leq s\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right), \mathbf{z}_{1} \in \mathcal{Z}_{1}, \mathbf{z}_{2} \in \mathcal{Z}_{2}, \end{aligned} Ws,1(P,Q)=f,gsup s.t. Z1g(z1)dP(z1)+Z2f(z2)dQ(z2)f(z2)+g(z1)s(z1,z2),z1Z1,z2Z2,

2.4 Some Special Cases

2.5 The Transport Cost Function

  • we discuss a number of different scenarios on what may be known regarding the data and the implied appropriate corresponding cost function
    • sparse
    • dense
    • group sparsity

2.5.1 Transport Cost Function via Metric Learning

2.6 Robustness of the Wasserstein Ambiguity Set

  • Theorem 2.6.1. Suppose we are given two probability distributions P \mathbb{P} P and P o u t \mathbb{P}_{out} Pout, and the mixture distribution P m i x \mathbb{P}_{mix} Pmix is a convex combination of the two: P m i x = q P o u t + ( 1 − q ) P \mathbb{P}_{mix} = q\mathbb{P}_{out} + (1 − q)\mathbb{P} Pmix=qPout+(1q)P. Then, for any cost function s s s,
    W s , 1 ( P out  , P mix  ) W s , 1 ( P , P mix  ) = 1 − q q \frac{W_{s, 1}\left(\mathbb{P}_{\text {out }}, \mathbb{P}_{\text {mix }}\right)}{W_{s, 1}\left(\mathbb{P}, \mathbb{P}_{\text {mix }}\right)}=\frac{1-q}{q} Ws,1(P,Pmix )Ws,1(Pout ,Pmix )=q1q
  • We claim that when q q q is small, if the Wasserstein ball radius ϵ \epsilon ϵ is chosen judiciously, the true distribution P \mathbb{P} P will be included in the ϵ \epsilon ϵ-Wasserstein ball Ω \Omega Ω while the outlying distribution P o u t \mathbb{P}_{out} Pout will be excluded.

2.7 Setting the Radius of the Wasserstein Ball

  • In the next two subsections we discuss two practical radius selection approaches

2.7.1 Measure Concentration

2.7.2 Robust Wasserstein Profile Inference

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值