1.原理
1.1 原论文
最初应用在生成模型上
《Wasserstein GAN》: https://arxiv.org/abs/1701.07875
1.2 全面介绍
https://zhuanlan.zhihu.com/p/25071913
1.3 优点
Wasserstein距离相比KL散度、JS散度的优越性在于,即便两个分布没有重叠,Wasserstein距离仍然能够反映它们的远近。
2.代码
import torch
import torch.nn as nn
# Adapted from https://github.com/gpeyre/SinkhornAutoDiff
class SinkhornDistance(nn.Module):
r"""
Given two empirical measures each with :math:`P_1` locations
:math:`x\in\mathbb{R}^{D_1}` and :math:`P_2` locations :math:`y\in\mathbb{R}^{D_2}`,
outputs an approximation of the regularized OT cost for point clouds.
Args:
eps (float): regularization coefficient
max_iter (int): maximum number of Sinkhorn iterations
reduction (string, optional): Specifies the reduction to apply to the output:
'none' | 'mean' | 'sum'. 'none': no reduction will be applied,
'mean': the sum of the output will be divided by the number of
elements in the output, 'sum': the output will be summed. Default: 'none'
Shape:
- Input: :math:`(N, P_1, D_1)`, :math:`(N, P_2, D_2)`
- Output: :math:`(N)` or :math:`()`, depending on `reduction`
"""
def __init__(self, eps, max_iter, reduction='none'):
super(SinkhornDistance, self).__init__()
self.eps = eps
self.max_iter = max_iter
self.reduction = reduction
def forward(self, x, y):
# The Sinkhorn algorithm takes as input three variables :
C = self._cost_matrix(x, y)