《Distributionally Robust Learning》第二章读书笔记

最新推荐文章于 2024-06-15 09:33:40 发布

喵呜嘻嘻嘻

最新推荐文章于 2024-06-15 09:33:40 发布

阅读量339

点赞数

分类专栏：鲁棒优化文章标签：线性代数

本文链接：https://blog.csdn.net/z3w97/article/details/120250749

版权

2 篇文章 0 订阅

订阅专栏

2 The Wasserstein Metric

Consider the Linear Programming (LP) problem:
$\begin{aligned} W_{\mathbf{S}, 1}(\mathbb{P}, \mathbb{Q})=\min _{\pi} & \sum_{i=1}^{m} \sum_{j=1}^{n} \pi(i, j) s(i, j) \\ \text { s.t. } & \sum_{i=1}^{m} \pi(i, j)=q_{j}, \quad j \in \llbracket n \rrbracket \\ & \sum_{j=1}^{n} \pi(i, j)=p_{i}, \quad i \in \llbracket m \rrbracket \\ & \pi(i, j) \geq 0, \quad \forall i, j \end{aligned}$
the objective value is the order-1 Wasserstein distance between distributions $\mathbb{P}$ and $\mathbb{Q}$
Similarly, by defining a cost matrix $\mathbf{S}^{t}=\left((s(i, j))^{t}\right)$ , t), we have the order- $t$ Wasserstein distance
$W_{\mathbf{S}, t}(\mathbb{P}, \mathbb{Q})=\left(W_{\mathbf{S}^{t}, 1}(\mathbb{P}, \mathbb{Q})\right)^{1 / t}$
The above LP formulation is equivalent to the well-known transportation problem (Bertsimas and Tsitsiklis, 1997)

In this section we establish that the Wasserstein distance $W_{\mathbf{S}, t}(\mathbb{P}, \mathbb{Q})$ is a distance metric, assuming that the underlying cost $s (i, j)$ is a proper distance metric.
$W_{\mathbf{S}, 1}(\mathbb{P}, \mathbb{Q})$ , viewed as a function of the vectors $p$ and $q$ corresponding to $\mathbb{P}$ and $\mathbb{Q}$ , is a convex function.

The dual of LP in section 2.1
$\begin{aligned} W_{\mathbf{S}, 1}(\mathbb{P}, \mathbb{Q})=\max _{\mathbf{f}, \mathbf{g}} & \sum_{i=1}^{m} g_{i} p_{i}+\sum_{j=1}^{n} f_{j} q_{j} \\ \text { s.t. } & f_{j}+g_{i} \leq s(i, j), \quad i \in \llbracket m \rrbracket, j \in \llbracket n \rrbracket \end{aligned}$
interpretation

Primal:
$W_{s, 1}(\mathbb{P}, \mathbb{Q})=\min _{\pi} \int_{\mathcal{Z}_{1} \times \mathcal{Z}_{2}} s\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right) \mathrm{d} \pi\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right)$
$W_{s, t}(\mathbb{P}, \mathbb{Q})=\left(W_{s^{t}, 1}(\mathbb{P}, \mathbb{Q})\right)^{1 / t}$
$s^{t}\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right)=\left(s\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right)\right)^{t}$
Dual:
$\begin{aligned} W_{s, 1}(\mathbb{P}, \mathbb{Q})=\sup _{f, g} & \int_{\mathcal{Z}_{1}} g\left(\mathbf{z}_{1}\right) d \mathbb{P}\left(\mathbf{z}_{1}\right)+\int_{\mathcal{Z}_{2}} f\left(\mathbf{z}_{2}\right) \mathrm{d} \mathbb{Q}\left(\mathbf{z}_{2}\right) \\ \text { s.t. } & f\left(\mathbf{z}_{2}\right)+g\left(\mathbf{z}_{1}\right) \leq s\left(\mathbf{z}_{1}, \mathbf{z}_{2}\right), \mathbf{z}_{1} \in \mathcal{Z}_{1}, \mathbf{z}_{2} \in \mathcal{Z}_{2}, \end{aligned}$

we discuss a number of different scenarios on what may be known regarding the data and the implied appropriate corresponding cost function
- sparse
- dense
- group sparsity

Theorem 2.6.1. Suppose we are given two probability distributions $\mathbb{P}$ and $\mathbb{P}_{out}$ , and the mixture distribution $\mathbb{P}_{mix}$ is a convex combination of the two: $\mathbb{P}_{mix} = q\mathbb{P}_{out} + (1 − q)\mathbb{P}$ . Then, for any cost function $s$ ,
$\frac{W_{s, 1}\left(\mathbb{P}_{\text {out }}, \mathbb{P}_{\text {mix }}\right)}{W_{s, 1}\left(\mathbb{P}, \mathbb{P}_{\text {mix }}\right)}=\frac{1-q}{q}$
We claim that when $q$ is small, if the Wasserstein ball radius $\epsilon$ is chosen judiciously, the true distribution $\mathbb{P}$ will be included in the $\epsilon$ -Wasserstein ball $\Omega$ while the outlying distribution $\mathbb{P}_{out}$ will be excluded.

In the next two subsections we discuss two practical radius selection approaches

关注

专栏目录