最近在看深度学习的文章,发现它们经常会提到Lipschitz条件,特此做个记录。
Lipschitz continuous
- In particular, a real-valued function f : R → R f : R \to R f:R→R is called Lipschitz continuous if there exists a positive real constant K K K such that, for all real x 1 x_{1} x1 and x 2 x_{2} x2, ∣ f ( x 1 ) − f ( x 2 ) ∣ ≤ K ∣ x 1 − x 2 ∣ , {\displaystyle |f(x_{1})-f(x_{2})|\leq K|x_{1}-x_{2}|,} ∣f(x1)−f(x2)∣≤K∣x1−x2∣,where K K K is called Lipschitz constant.
- 可以证明,导函数有界的函数都满足Lipschitz连续。
Lipschitz约束的一个应用
Reference:
Spectral Norm Regularization for Improving the Generalizability of Deep Learning
- L2 正则化能使得模型更好地满足Lipschitz条件,从而降低模型对输入扰动的敏感性,增强模型的泛化性能。但这只是一个比较粗糙的条件,更精确的条件应该是谱范数(Spectral Norm)。
- Spectral Norm ∣ ∣ W ∣ ∣ 2 ||W||_2 ∣∣W∣∣2: W T W W^{T}W WTW的最大特征根的平方根。( W W W是神经网络的权重矩阵)
- Spectral Norm Regularization: 把谱范数的平方作为额外的正则项,取代简单的L2 正则项。
- 注:柯西不等式、幂迭代(power iteration)