梯度映射 (Gradient Mapping)
在有约束的最小化问题中,目标函数的梯度应该使用不同于无约束的处理方法。对于有约束的最小化问题,可以引入一个目标。
定义:设 γ > 0 \gamma > 0 γ>0,记 x Q ( x ~ ; γ ) = arg min x ∈ Q [ f ( x ~ ) + ⟨ ∇ f ( x ~ ) , x − x ~ ⟩ + γ 2 ∥ x − x ~ ∥ 2 ] x_Q(\widetilde{x};\gamma)=\argmin\limits_{x\in Q}[f(\widetilde{x})+\langle \nabla f(\widetilde{x}),x-\widetilde{x}\rangle+\frac{\gamma}{2}\|x-\widetilde{x}\|^2] xQ(x ;γ)=x∈Qargmin[f(x )+⟨∇f(x ),x−x ⟩+2γ∥x−x ∥2], g Q ( x ~ , γ ) = γ ⋅ x Q ( x ~ ; γ ) g_Q(\widetilde{x},\gamma)=\gamma\cdot x_Q(\widetilde{x};\gamma) gQ(x ,γ)=γ⋅xQ(x ;γ),称 g Q ( x ~ ; γ ) g_Q(\widetilde{x};\gamma) gQ(x ;γ)为函数 f f f在 Q Q Q上的梯度映射
对于 Q ≡ R n Q\equiv \mathbb{R}^n Q≡Rn,有 x Q ( x ~ ; γ ) = x ~ − 1 γ ∇ f ( x ~ ) , g Q ( x ~ ; γ ) = ∇ f ( x ~ ) x_Q(\widetilde{x};\gamma)=\widetilde{x}-\frac{1}{\gamma}\nabla f(\widetilde{x}), g_Q(\widetilde{x};\gamma)=\nabla f(\widetilde{x}) xQ(x ;γ)=x −γ1∇f(x ),gQ(x ;γ)=∇f(x )。因此, 1 γ \frac{1}{\gamma} γ1可以被看做梯度下降的步长, x ~ → x Q ( x ~ ; γ ) \widetilde{x}\rightarrow x_Q(\widetilde{x};\gamma) x →xQ(x ;γ).
定理1:设 f ∈ F μ , L 1 , 1 ( R n ) , γ ≥ L , x ~ ∈ R n f\in \mathfrak{F}_{\mu,L}^{1,1}(\mathbb{R}^n), \gamma\geq L, \widetilde{x}\in \mathbb{R}^n f∈Fμ,L1,1(Rn),γ≥L,x ∈Rn,那么对于任意的 x ∈ Q x\in Q x∈Q,有:
f ( x ) ≥ f ( x Q ( x ~ ; γ ) ) + ⟨ g Q ( x ~ ; γ ) , x − x ~ ⟩ + 1 2 γ ∥ g Q ( x ~ ; γ ) ∥ 2 + μ 2 ∥ x − x ~ ∥ 2 f(x)\geq f(x_Q(\widetilde{x};\gamma))+\langle g_Q(\widetilde{x};\gamma),x-\widetilde{x}\rangle+\frac{1}{2\gamma}\|g_Q(\widetilde{x};\gamma)\|^2+\frac{\mu}{2}\|x-\widetilde{x}\|^2 f(x)≥f(xQ(x ;γ))+⟨gQ(x ;γ),x−x ⟩+2γ1∥gQ(x ;γ)∥2+2μ∥x−x ∥2.
证明:设 x Q = x Q ( x ~ ; γ ) , g Q = g Q ( x ~ ; γ ) x_Q=x_Q(\widetilde{x};\gamma), g_Q=g_Q(\widetilde{x};\gamma) xQ=xQ(x ;γ),gQ=gQ(x ;γ), ϕ ( x ) = f ( x ~ ) + ⟨ ∇ f ( x ~ ) , x − x ~ ⟩ + γ 2 ∥ x − x ~ ∥ 2 \phi(x)=f(\widetilde{x})+\langle \nabla f(\widetilde{x}),x-\widetilde{x}\rangle+\frac{\gamma}{2}\|x-\widetilde{x}\|^2 ϕ(x)=f(x )+⟨∇f(x ),x−x ⟩+2γ∥x−x ∥2,得到 ∇ ϕ ( x ) = ∇ f ( x ~ ) + γ ( x − x ~ ) \nabla \phi(x)=\nabla f(\widetilde{x})+\gamma(x-\widetilde{x}) ∇ϕ(x)=∇f(x )+γ(x−x ),并且 ⟨ ∇ f ( x ~ ) − g Q , x − x Q ⟩ = ⟨ ∇ ϕ ( x Q ) , x − x Q ⟩ ≥ 0 \langle \nabla f(\widetilde{x})-g_Q, x-x_Q\rangle=\langle \nabla \phi(x_Q),x-x_Q\rangle \geq 0 ⟨∇f(x )−gQ,x−xQ⟩=⟨∇ϕ(xQ),x−