First Order Methods in Optimization Ch3. Subgradients (Part II)

第三章: 次梯度 (Part II)

3. 方向导数

3.1 定义与基本性质

f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常函数, x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)). f f f x \mathbf{x} x处沿给定方向 d ∈ E \mathbf{d}\in\mathbb{E} dE方向导数(若存在)定义为 f ′ ( x ; d ) = lim ⁡ α → 0 + f ( x + α d ) − f ( x ) α . f'(\mathbf{x};\mathbf{d})=\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha\mathbf{d})-f(\mathbf{x})}{\alpha}. f(x;d)=α0+limαf(x+αd)f(x).凸函数在有效域内部任意点处沿所有方向都存在方向导数. 这是下面的定理8.

定理8 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常凸函数, x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)). 于是对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE, 方向导数 f ′ ( x ; d ) f'(\mathbf{x;d}) f(x;d)存在.
证明: 令 x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)), d ≠ 0 \mathbf{d}\ne\mathbf{0} d=0. 于是方向导数(若存在的话)就是极限 lim ⁡ t → 0 + g ( t ) − g ( 0 ) t , \lim_{t\to0^+}\frac{g(t)-g(0)}{t}, t0+limtg(t)g(0),其中 g ( t ) = f ( x + t d ) g(t)=f(\mathbf{x}+t\mathbf{d}) g(t)=f(x+td). 定义 h ( t ) ≡ g ( t ) − g ( 0 ) t h(t)\equiv\frac{g(t)-g(0)}{t} h(t)tg(t)g(0), 则上述极限式可等价地写作 lim ⁡ t → 0 + h ( t ) . \lim_{t\to0^+}h(t). t0+limh(t). ϵ > 0 \epsilon>0 ϵ>0使得 x + t d , x − t d ∈ i n t ( d o m ( f ) ) , ∀ t ∈ [ 0 , ϵ ] \mathbf{x}+t\mathbf{d},\mathbf{x}-t\mathbf{d}\in\mathrm{int}(\mathrm{dom}(f)),\forall t\in[0,\epsilon] x+td,xtdint(dom(f)),t[0,ϵ]. 现令 0 < t 1 < t 2 ≤ ϵ 0<t_1<t_2\le\epsilon 0<t1<t2ϵ. 于是 x + t 1 d = ( 1 − t 1 t 2 ) x + t 1 t 2 ( x + t 2 d ) , \mathbf{x}+t_1\mathbf{d}=\left(1-\frac{t_1}{t_2}\right)\mathbf{x}+\frac{t_1}{t_2}(\mathbf{x}+t_2\mathbf{d}), x+t1d=(1t2t1)x+t2t1(x+t2d),因此由 f f f的凸性我们有 f ( x + t 1 d ) ≤ ( 1 − t 1 t 2 ) f ( x ) + t 1 t 2 f ( x + t 2 d ) . f(\mathbf{x}+t_1\mathbf{d})\le\left(1-\frac{t_1}{t_2}\right)f(\mathbf{x})+\frac{t_1}{t_2}f(\mathbf{x}+t_2\mathbf{d}). f(x+t1d)(1t2t1)f(x)+t2t1f(x+t2d).经整理后上述不等式就可以写成 f ( x + t 1 d ) − f ( x ) t 1 ≤ f ( x + t 2 d ) − f ( x ) t 2 , \frac{f(\mathbf{x}+t_1\mathbf{d})-f(\mathbf{x})}{t_1}\le\frac{f(\mathbf{x}+t_2\mathbf{d})-f(\mathbf{x})}{t_2}, t1f(x+t1d)f(x)t2f(x+t2d)f(x),这等同于 h ( t 1 ) ≤ h ( t 2 ) h(t_1)\le h(t_2) h(t1)h(t2). 因此 h h h R + \mathbb{R}_+ R+上是单增的. 由数学分析, 我们只需要证明 h h h ( 0 , ϵ ] (0,\epsilon] (0,ϵ]下有界. 事实上, 取 0 < t ≤ ϵ 0<t\le\epsilon 0<tϵ, 并注意到 x = ϵ ϵ + t ( x + t d ) + t ϵ + t ( x − ϵ d ) . \mathbf{x}=\frac{\epsilon}{\epsilon+t}(\mathbf{x}+t\mathbf{d})+\frac{t}{\epsilon+t}(\mathbf{x}-\epsilon\mathbf{d}). x=ϵ+tϵ(x+td)+ϵ+tt(xϵd).因此再由 f f f的凸性, 我们有 f ( x ) ≤ ϵ ϵ + t f ( x + t d ) + t ϵ + t f ( x − ϵ d ) , f(\mathbf{x})\le\frac{\epsilon}{\epsilon+t}f(\mathbf{x}+t\mathbf{d})+\frac{t}{\epsilon+t}f(\mathbf{x}-\epsilon\mathbf{d}), f(x)ϵ+tϵf(x+td)+ϵ+ttf(xϵd),经整理后即可得 h ( t ) = f ( x + t d ) − f ( x ) t ≥ f ( x − f ( x − ϵ d ) ϵ , h(t)=\frac{f(\mathbf{x}+t\mathbf{d})-f(\mathbf{x})}{t}\ge\frac{f(\mathbf{x}-f(\mathbf{x}-\epsilon\mathbf{d})}{\epsilon}, h(t)=tf(x+td)f(x)ϵf(xf(xϵd),这就证明了 h h h ( 0 , ϵ ] (0,\epsilon] (0,ϵ]上下有界. 证毕.

下面讨论函数 d ↦ f ′ ( x ; d ) \mathbf{d}\mapsto f'(\mathbf{x};\mathbf{d}) df(x;d)的一些基本性质. 下面的引理表明, 它是凸的, 且是一阶齐次的.

引理2 ( d ↦ f ′ ( x ; d ) \mathbf{d}\mapsto f'(\mathbf{x;d}) df(x;d)的凸性和齐次性) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常凸函数, x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)). 于是
(i) 函数 d ↦ f ′ ( x ; d ) \mathbf{d}\mapsto f'(\mathbf{x;d}) df(x;d)是凸函数;
(ii) 对 ∀ λ ≥ 0 , d ∈ E \forall\lambda\ge0,\mathbf{d}\in\mathbb{E} λ0,dE, f ′ ( x ; λ d ) = λ f ′ ( x ; d ) f'(\mathbf{x;\lambda d})=\lambda f'(\mathbf{x;d}) f(x;λd)=λf(x;d).

证明: (i) 为证明 g ( d ≡ f ′ ( x ; d ) g(\mathbf{d}\equiv f'(\mathbf{x;d}) g(df(x;d)是凸函数, 取 d 1 , d 2 ∈ E , λ ∈ [ 0 , 1 ] \mathbf{d}_1,\mathbf{d}_2\in\mathbb{E},\lambda\in[0,1] d1,d2E,λ[0,1]. 于是 f ′ ( x ; λ d 1 + ( 1 − λ ) d 2 ) = lim ⁡ α → 0 + f ( x + α [ λ d 1 + ( 1 − λ ) d 2 ] ) − f ( x ) α = lim ⁡ α → 0 + f ( λ ( x + α d 1 ) + ( 1 − λ ) ( x + α d 2 ) ) − f ( x ) α ≤ lim ⁡ α → 0 + λ f ( x + α d 1 ) + ( 1 − λ ) f ( x + α d 2 ) − f ( x ) α = λ lim ⁡ α → 0 + f ( x + α d 1 ) − f ( x ) α + ( 1 − λ ) lim ⁡ α → 0 + f ( x + α d 2 ) − f ( x ) α = λ f ′ ( x ; d 1 ) + ( 1 − λ ) f ′ ( x ; d 2 ) , \begin{aligned}f'(\mathbf{x}&;\lambda\mathbf{d}_1+(1-\lambda)\mathbf{d}_2)\\ &=\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha[\lambda\mathbf{d}_1+(1-\lambda)\mathbf{d}_2])-f(\mathbf{x})}{\alpha}\\ &=\lim_{\alpha\to0^+}\frac{f(\lambda(\mathbf{x}+\alpha\mathbf{d}_1)+(1-\lambda)(\mathbf{x}+\alpha\mathbf{d}_2))-f(\mathbf{x})}{\alpha}\\ &\le\lim_{\alpha\to0^+}\frac{\lambda f(\mathbf{x}+\alpha\mathbf{d}_1)+(1-\lambda)f(\mathbf{x}+\alpha\mathbf{d}_2)-f(\mathbf{x})}{\alpha}\\ &=\lambda\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha\mathbf{d}_1)-f(\mathbf{x})}{\alpha}+(1-\lambda)\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha\mathbf{d}_2)-f(\mathbf{x})}{\alpha}\\ &=\lambda f'(\mathbf{x;d}_1)+(1-\lambda)f'(\mathbf{x;d}_2),\end{aligned} f(x;λd1+(1λ)d2)=α0+limαf(x+α[λd1+(1λ)d2])f(x)=α0+limαf(λ(x+αd1)+(1λ)(x+αd2))f(x)α0+limαλf(x+αd1)+(1λ)f(x+αd2)f(x)=λα0+limαf(x+αd1)f(x)+(1λ)α0+limαf(x+αd2)f(x)=λf(x;d1)+(1λ)f(x;d2),(ii) 若 λ = 0 \lambda=0 λ=0, 则结论显然. 取 λ > 0 \lambda>0 λ>0. 于是 f ′ ( x ; λ d ) = lim ⁡ α → 0 + f ( x + α λ d ) − f ( x ) α = λ lim ⁡ α → 0 + f ( x + α λ d ) − f ( x ) α λ = λ f ′ ( x ; d ) . f'(\mathbf{x;\lambda d})=\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha\lambda\mathbf{d})-f(\mathbf{x})}{\alpha}=\lambda\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha\lambda\mathbf{d})-f(\mathbf{x})}{\alpha\lambda}=\lambda f'(\mathbf{x;d}). f(x;λd)=α0+limαf(x+αλd)f(x)=λα0+limαλf(x+αλd)f(x)=λf(x;d).

下面的一个结论指出了方向导数和函数值之间的关系.

引理3 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常凸函数, x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)). 于是 f ( y ) ≥ f ( x ) + f ′ ( x ; y − x ) , ∀ y ∈ d o m ( f ) . f(\mathbf{y})\ge f(\mathbf{x})+f'(\mathbf{x;y-x}),\quad\forall\mathbf{y}\in\mathrm{dom}(f). f(y)f(x)+f(x;yx),ydom(f).证明: 由方向导数的定义, f ′ ( x ; y − x ) = lim ⁡ α → 0 + f ( x + α ( y − x ) ) − f ( x ) α = lim ⁡ α → 0 + f ( ( 1 − α ) x + α y ) − f ( x ) α ≤ lim ⁡ α → 0 + − α f ( x ) + α f ( y ) α = f ( y ) − f ( x ) . \begin{aligned}f'(\mathbf{x;y-x})&=\lim_{\alpha\to0^+}\frac{f(\mathbf{x}+\alpha(\mathbf{y-x}))-f(\mathbf{x})}{\alpha}\\ &=\lim_{\alpha\to0^+}\frac{f((1-\alpha)\mathbf{x}+\alpha\mathbf{y})-f(\mathbf{x})}{\alpha}\\ &\le\lim_{\alpha\to0^+}\frac{-\alpha f(\mathbf{x})+\alpha f(\mathbf{y})}{\alpha}\\ &=f(\mathbf{y})-f(\mathbf{x}).\end{aligned} f(x;yx)=α0+limαf(x+α(yx))f(x)=α0+limαf((1α)x+αy)f(x)α0+limααf(x)+αf(y)=f(y)f(x).

在我们要计算有限个函数的最大函数的方向导数时, 我们有如下公式可以使用. 而这个公式无需任何凸性的假设.

定理9 (有限个函数的最大函数的方向导数) 设 f ( x ) = max ⁡ { f 1 ( x ) , f 2 ( x ) , … , f m ( x ) } f(\mathbf{x})=\max\{f_1(\mathbf{x}),f_2(\mathbf{x}),\ldots,f_m(\mathbf{x})\} f(x)=max{f1(x),f2(x),,fm(x)}, 其中 f 1 , f 2 , … , f m : E → ( − ∞ , ∞ ] f_1,f_2,\ldots,f_m:\mathbb{E}\to(-\infty,\infty] f1,f2,,fm:E(,]为正常函数. 取 x ∈ ⋂ i = 1 m i n t ( d o m ( f i ) ) \mathbf{x}\in\bigcap_{i=1}^m\mathrm{int}(\mathrm{dom}(f_i)) xi=1mint(dom(fi)), d ∈ E \mathbf{d}\in\mathbb{E} dE. 假设对 ∀ i ∈ { 1 , 2 , … , m } \forall i\in\{1,2,\ldots,m\} i{1,2,,m}, f i ′ ( x ; d ) f_i'(\mathbf{x;d}) fi(x;d)都存在. 于是 f ′ ( x ; d ) = max ⁡ i ∈ I ( x ) f i ′ ( x ; d ) , f'(\mathbf{x;d})=\max_{i\in I(\mathbf{x})}f_i'(\mathbf{x;d}), f(x;d)=iI(x)maxfi(x;d),其中 I ( x ) = { i : f i ( x ) = f ( x ) } I(\mathbf{x})=\{i:f_i(\mathbf{x})=f(\mathbf{x})\} I(x)={i:fi(x)=f(x)}.

证明: 对 ∀ i ∈ { 1 , 2 , … , m } \forall i\in\{1,2,\ldots,m\} i{1,2,,m}, lim ⁡ t → 0 + f i ( x + t d ) = lim ⁡ t → 0 + [ t f i ( x + t d ) − f i ( x ) t + f i ( x ) ] = 0 ⋅ f i ′ ( x ; d ) + f i ( x ) = f i ( x ) . \lim_{t\to0^+}f_i(\mathbf{x}+t\mathbf{d})=\lim_{t\to0^+}\left[t\frac{f_i(\mathbf{x}+t\mathbf{d})-f_i(\mathbf{x})}{t}+f_i(\mathbf{x})\right]=0\cdot f_i'(\mathbf{x;d})+f_i(\mathbf{x})=f_i(\mathbf{x}). t0+limfi(x+td)=t0+lim[ttfi(x+td)fi(x)+fi(x)]=0fi(x;d)+fi(x)=fi(x). I ( x ) I(\mathbf{x}) I(x)的定义, f i ( x ) > f j ( x ) , ∀ i ∈ I ( x ) , j ∉ I ( x ) f_i(\mathbf{x})>f_j(\mathbf{x}),\forall i\in I(\mathbf{x}),j\notin I(\mathbf{x}) fi(x)>fj(x),iI(x),j/I(x). 由上式, 我们推出 ∃ ϵ > 0 \exists\epsilon>0 ϵ>0使得 f i ( x + t d ) > f j ( x + t d ) , ∀ i ∈ I ( x ) , j ∉ I ( x ) , t ∈ ( 0 , ϵ ] f_i(\mathbf{x}+t\mathbf{d})>f_j(\mathbf{x}+t\mathbf{d}),\forall i\in I(\mathbf{x}),j\notin I(\mathbf{x}),t\in(0,\epsilon] fi(x+td)>fj(x+td),iI(x),j/I(x),t(0,ϵ]. 因此对 ∀ t ∈ ( 0 , ϵ ] \forall t\in(0,\epsilon] t(0,ϵ], f ( x + t d ) = max ⁡ i = 1 , 2 , … , m f i ( x + t d ) = max ⁡ i ∈ I ( x ) f i ( x + t d ) . f(\mathbf{x}+t\mathbf{d})=\max_{i=1,2,\ldots,m}f_i(\mathbf{x}+t\mathbf{d})=\max_{i\in I(\mathbf{x})}f_i(\mathbf{x}+t\mathbf{d}). f(x+td)=i=1,2,,mmaxfi(x+td)=iI(x)maxfi(x+td).因此对 ∀ t ∈ ( 0 , ϵ ] \forall t\in(0,\epsilon] t(0,ϵ], f ( x + t d ) − f ( x ) t = max ⁡ i ∈ I ( x ) f i ( x + t d ) − f ( x ) t = max ⁡ i ∈ I ( x ) f i ( x + t d ) − f i ( x ) t , \frac{f(\mathbf{x}+t\mathbf{d})-f(\mathbf{x})}{t}=\frac{\max_{i\in I(\mathbf{x})}f_i(\mathbf{x}+t\mathbf{d})-f(\mathbf{x})}{t}=\max_{i\in I(\mathbf{x})}\frac{f_i(\mathbf{x}+t\mathbf{d})-f_i(\mathbf{x})}{t}, tf(x+td)f(x)=tmaxiI(x)fi(x+td)f(x)=iI(x)maxtfi(x+td)fi(x),这里最后一个等式是因为 f ( x ) = f i ( x ) , ∀ i ∈ I ( x ) f(\mathbf{x})=f_i(\mathbf{x}),\forall i\in I(\mathbf{x}) f(x)=fi(x),iI(x). 取极限 t → 0 + t\to0^+ t0+, 我们最终得到 f ′ ( x ; d ) = lim ⁡ t → 0 + f ( x + t d ) − f ( x ) t = lim ⁡ t → 0 + max ⁡ i ∈ I ( x ) f i ( x + t d ) − f i ( x ) t = max ⁡ i ∈ I ( x ) lim ⁡ t → 0 + f i ( x + t d ) − f i ( x ) t = max ⁡ i ∈ I ( x ) f i ′ ( x ; d ) . \begin{aligned}f'(\mathbf{x;d})&=\lim_{t\to0^+}\frac{f(\mathbf{x}+t\mathbf{d})-f(\mathbf{x})}{t}\\ &=\lim_{t\to0^+}\max_{i\in I(\mathbf{x})}\frac{f_i(\mathbf{x}+t\mathbf{d})-f_i(\mathbf{x})}{t}\\ &=\max_{i\in I(\mathbf{x})}\lim_{t\to0^+}\frac{f_i(\mathbf{x}+t\mathbf{d})-f_i(\mathbf{x})}{t}\\ &=\max_{i\in I(\mathbf{x})}f_i'(\mathbf{x;d}).\end{aligned} f(x;d)=t0+limtf(x+td)f(x)=t0+limiI(x)maxtfi(x+td)fi(x)=iI(x)maxt0+limtfi(x+td)fi(x)=iI(x)maxfi(x;d).
注意到定理9要求方向导数 f i ′ ( x ; d ) f_i'(\mathbf{x;d}) fi(x;d)都存在. 这在函数 f 1 , f 2 , … , f m f_1,f_2,\ldots,f_m f1,f2,,fm是凸函数时是自动成立的. 因此我们有如下推论.

推论3 (有限个函数的最大函数的方向导数——凸的情形) 设 f ( x ) = max ⁡ { f 1 ( x ) , f 2 ( x ) , … , f m ( x ) } f(\mathbf{x})=\max\{f_1(\mathbf{x}),f_2(\mathbf{x}),\ldots,f_m(\mathbf{x})\} f(x)=max{f1(x),f2(x),,fm(x)}, 其中 f 1 , f 2 , … , f m : E → ( − ∞ , ∞ ] f_1,f_2,\ldots,f_m:\mathbb{E}\to(-\infty,\infty] f1,f2,,fm:E(,]为正常凸函数. 取 x ∈ ⋂ i = 1 m i n t ( d o m ( f i ) ) \mathbf{x}\in\bigcap_{i=1}^m\mathrm{int}(\mathrm{dom}(f_i)) xi=1mint(dom(fi)), d ∈ E \mathbf{d}\in\mathbb{E} dE. 于是 f ′ ( x ; d ) = max ⁡ i ∈ I ( x ) f i ′ ( x ; d ) , f'(\mathbf{x;d})=\max_{i\in I(\mathbf{x})}f_i'(\mathbf{x;d}), f(x;d)=iI(x)maxfi(x;d),其中 I ( x ) = { i : f i ( x ) = f ( x ) } I(\mathbf{x})=\{i:f_i(\mathbf{x})=f(\mathbf{x})\} I(x)={i:fi(x)=f(x)}.

3.2 极大公式

下面我们将证明一个极其重要而且使用广泛的结论——极大公式 (the max formula). 这一公式是次梯度和方向导数间的桥梁.

定理10 (极大公式) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常凸函数. 则对 ∀ x ∈ i n t ( d o m ( f ) ) \forall\mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)), d ∈ E \mathbf{d}\in\mathbb{E} dE, f ′ ( x ; d ) = max ⁡ { ⟨ g , d ⟩ : g ∈ ∂ f ( x ) } . f'(\mathbf{x};\mathbf{d})=\max\{\langle\mathbf{g},\mathbf{d}\rangle:\mathbf{g}\in\partial f(\mathbf{x})\}. f(x;d)=max{⟨g,d:gf(x)}.证明: 令 x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)), d ∈ E \mathbf{d}\in\mathbb{E} dE. 由次梯度不等式, 对 ∀ g ∈ ∂ f ( x ) \forall\mathbf{g}\in\partial f(\mathbf{x}) gf(x), 我们有 f ′ ( x ; d ) = lim ⁡ α → 0 + 1 α ( f ( x + α d ) − f ( x ) ) ≥ lim ⁡ α → 0 + ⟨ g , d ⟩ = ⟨ g , d ⟩ , f'(\mathbf{x;d})=\lim_{\alpha\to0^+}\frac{1}{\alpha}\left(f(\mathbf{x}+\alpha\mathbf{d})-f(\mathbf{x})\right)\ge\lim_{\alpha\to0^+}\langle\mathbf{g,d}\rangle=\langle\mathbf{g,d}\rangle, f(x;d)=α0+limα1(f(x+αd)f(x))α0+limg,d=g,d,从而得到一边不等式 f ′ ( x ; d ) ≥ max ⁡ { ⟨ g , d ⟩ : g ∈ ∂ f ( x ) } . f'(\mathbf{x;d})\ge\max\{\langle\mathbf{g,d}\rangle:\mathbf{g}\in\partial f(\mathbf{x})\}. f(x;d)max{⟨g,d:gf(x)}.为证反向不等式, 定义函数 h ( w ) = f ′ ( x ; w ) h(\mathbf{w})=f'(\mathbf{x;w}) h(w)=f(x;w). 于是由引理2的(i), 我们知道 h h h是一个实值凸函数, 从而由推论1可推出 h h h在全空间 E \mathbb{E} E上次可微. 取 g ~ ∈ ∂ h ( d ) \tilde{\mathbf{g}}\in\partial h(\mathbf{d}) g~h(d). 于是对 ∀ v ∈ E ,   α ≥ 0 \forall\mathbf{v}\in\mathbb{E},\,\alpha\ge0 vE,α0, 从 h h h的正齐次性 (引理2的(ii)) 即可得 α f ′ ( x ; v ) = f ′ ( x ; α v ) = h ( α v ) ≥ h ( d ) + ⟨ g ~ , α v − d ⟩ = f ′ ( x ; d ) + ⟨ g ~ , α v − d ⟩ . \alpha f'(\mathbf{x;v})=f'(\mathbf{x;\alpha v})=h(\alpha\mathbf{v})\ge h(\mathbf{d})+\langle\tilde{\mathbf{g}},\alpha\mathbf{v-d}\rangle=f'(\mathbf{x;d})+\langle\tilde{\mathbf{g}},\alpha\mathbf{v-d}\rangle. αf(x;v)=f(x;αv)=h(αv)h(d)+g~,αvd=f(x;d)+g~,αvd.移项可得 α ( f ′ ( x ; v ) − ⟨ g ~ , v ⟩ ) ≥ f ′ ( x ; d ) − ⟨ g ~ , d ⟩ . \alpha\left(f'(\mathbf{x;v})-\langle\tilde{\mathbf{g}},\mathbf{v}\rangle\right)\ge f'(\mathbf{x;d})-\langle\tilde{\mathbf{g}},\mathbf{d}\rangle. α(f(x;v)g~,v)f(x;d)g~,d.因上述不等式对 ∀ α ≥ 0 \forall\alpha\ge0 α0均成立, 因此左端 α \alpha α的系数必定非负: f ′ ( x ; v ) ≥ ⟨ g ~ , v ⟩ . f'(\mathbf{x;v})\ge\langle\tilde{\mathbf{g}},\mathbf{v}\rangle. f(x;v)g~,v.再由引理3, 可知对 ∀ y ∈ d o m ( f ) \forall\mathbf{y}\in\mathrm{dom}(f) ydom(f), f ( y ) ≥ f ( x ) + f ′ ( x ; y − x ) ≥ f ( x ) + ⟨ g ~ , y − x ⟩ , f(\mathbf{y})\ge f(\mathbf{x})+f'(\mathbf{x;y-x})\ge f(\mathbf{x})+\langle\tilde{\mathbf{g}},\mathbf{y-x}\rangle, f(y)f(x)+f(x;yx)f(x)+g~,yx,从而 g ~ ∈ ∂ f ( x ) \tilde{\mathbf{g}}\in\partial f(\mathbf{x}) g~f(x). 再取 α = 0 \alpha=0 α=0, 即得反向不等式 f ′ ( x ; d ) ≤ ⟨ g ~ , d ⟩ ≤ max ⁡ { ⟨ g , d ⟩ : g ∈ ∂ f ( x ) } . f'(\mathbf{x;d})\le\langle\tilde{\mathbf{g}},\mathbf{d}\rangle\le\max\{\langle\mathbf{g,d}\rangle:\mathbf{g}\in\partial f(\mathbf{x})\}. f(x;d)g~,dmax{⟨g,d:gf(x)}.证毕.

注2: 极大公式也可以用支撑函数写成更加简洁的形式: f ′ ( x ; d ⟩ = σ ∂ f ( x ) ( d ) . f'(\mathbf{x;d}\rangle=\sigma_{\partial f(\mathbf{x})}(\mathbf{d}). f(x;d=σf(x)(d).

3.3 可微性

定义4 (可微性 (differentiability)) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,], x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)). 函数 f f f称为是在 x \mathbf{x} x可微的 (differentiable), 若存在 g ∈ E ∗ \mathbf{g}\in\mathbb{E}^* gE使得 lim ⁡ h → 0 f ( x + h ) − f ( x ) − ⟨ g , h ⟩ ∥ h ∥ = 0. \lim_{\mathbf{h\to0}}\frac{f(\mathbf{x+h})-f(\mathbf{x})-\langle\mathbf{g,h}\rangle}{\Vert\mathbf{h}\Vert}=0. h0limhf(x+h)f(x)g,h=0.满足上述极限式的唯一1向量 g \mathbf{g} g称为 f f f x \mathbf{x} x处的梯度2, 记为 ∇ f ( x ) \nabla f(\mathbf{x}) f(x).

注3: 上述定义的实际上是Frechet可微性.

f f f x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f))处可微, 则我们有方向导数的一个简单表示.

定理11 (可微点处的方向导数) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为正常函数, f f f x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f))处可微. 则对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE, f ′ ( x ; d ) = ⟨ ∇ f ( x ) , d ⟩ . f'(\mathbf{x;d})=\langle\nabla f(\mathbf{x}),\mathbf{d}\rangle. f(x;d)=f(x),d.证明: d = 0 \mathbf{d}=\mathbf{0} d=0时结论显然成立. 下设 d ≠ 0 \mathbf{d}\neq\mathbf{0} d=0. 由方向导数的定义, f ′ ( x ; d ) = lim ⁡ α → 0 f ( x + α d ) − f ( x ) α = lim ⁡ α → 0 ∥ d ∥ f ( x + α d ) − f ( x ) − ⟨ ∇ f ( x ) , α d ⟩ α ∥ d ∥ + ⟨ ∇ f ( x ) , α d ⟩ α = ⟨ ∇ f ( x ) , d ⟩ . \begin{aligned}f'(\mathbf{x;d})&=\lim_{\alpha\to0}\frac{f(\mathbf{x+\alpha d})-f(\mathbf{x})}{\alpha}\\&=\lim_{\alpha\to0}\Vert\mathbf{d}\Vert\frac{f(\mathbf{x+\alpha d})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}),\alpha\mathbf{d}\rangle}{\alpha\Vert\mathbf{d}\Vert}+\frac{\langle\nabla f(\mathbf{x}),\alpha\mathbf{d}\rangle}{\alpha}\\&=\langle\nabla f(\mathbf{x}),\mathbf{d}\rangle.\end{aligned} f(x;d)=α0limαf(x+αd)f(x)=α0limdαdf(x+αd)f(x)f(x),αd+αf(x),αd=f(x),d.上述第三个等号来自 f f f x \mathbf{x} x处可微的假设. 证毕.

例8 (有限个可微函数的最大函数的方向导数) 考虑函数 f ( x ) = max ⁡ i = 1 , 2 , … , m f i ( x ) f(\mathbf{x})=\max_{i=1,2,\ldots,m}f_i(\mathbf{x}) f(x)=maxi=1,2,,mfi(x), 其中 f i : E → ( − ∞ , ∞ ] f_i:\mathbb{E}\to(-\infty,\infty] fi:E(,]是正常函数. 设 f 1 , f 2 , … , f m f_1,f_2,\ldots,f_m f1,f2,,fm在给定点 x ∈ ⋂ i = 1 m i n t ( d o m ( f i ) ) \mathbf{x}\in\bigcap_{i=1}^m\mathrm{int}(\mathrm{dom}(f_i)) xi=1mint(dom(fi))可微. 则由定理11, 对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE, f i ′ ( x ; d ) = ⟨ ∇ f i ( x ) , d ⟩ f_i'(\mathbf{x;d})=\langle\nabla f_i(\mathbf{x}),\mathbf{d}\rangle fi(x;d)=fi(x),d. 再由定理9, f ′ ( x ; d ) = max ⁡ i ∈ I ( x ) f i ′ ( x ; d ) = max ⁡ i ∈ I ( x ) ⟨ ∇ f i ( x ) , d ⟩ , f'(\mathbf{x;d})=\max_{i\in I(\mathbf{x})}f_i'(\mathbf{x;d})=\max_{i\in I(\mathbf{x})}\langle\nabla f_i(\mathbf{x}),\mathbf{d}\rangle, f(x;d)=iI(x)maxfi(x;d)=iI(x)maxfi(x),d,其中 I ( x ) = { i : f i ( x ) = f ( x ) } I(\mathbf{x})=\{i:f_i(\mathbf{x})=f(\mathbf{x})\} I(x)={i:fi(x)=f(x)}.

例9 ( 1 2 d C 2 ( ⋅ ) \frac{1}{2}d_C^2(\cdot) 21dC2()的梯度) 设 E \mathbb{E} E为欧式空间, C ⊂ E C\subset\mathbb{E} CE为非空闭凸集. 考虑函数 φ C : E → R \varphi_C:\mathbb{E}\to\mathbb{R} φC:ER定义为 φ C ( x ) = 1 2 d C 2 ( x ) = 1 2 ∥ x − P C ( x ) ∥ 2 \varphi_C(\mathbf{x})=\frac{1}{2}d_C^2(\mathbf{x})=\frac{1}{2}\Vert\mathbf{x}-P_C(\mathbf{x})\Vert^2 φC(x)=21dC2(x)=21xPC(x)2, 其中 P C P_C PC为集合 C C C的正交投影算子3, 定义为 P C ( x ) = arg ⁡ min ⁡ y ∈ C ∥ y − x ∥ . P_C(\mathbf{x})=\arg\min_{\mathbf{y}\in C}\Vert\mathbf{y-x}\Vert. PC(x)=argyCminyx∥.下面我们说明对 ∀ x ∈ E \forall\mathbf{x}\in\mathbb{E} xE, ∇ φ C ( x ) = x − P C ( x ) . \boxed{\nabla\varphi_C(\mathbf{x})=\mathbf{x}-P_C(\mathbf{x}).} φC(x)=xPC(x).为此, 固定 x ∈ E \mathbf{x}\in\mathbb{E} xE, 定义函数 g x g_{\mathbf{x}} gx: g x ( d ) ≡ φ C ( x + d ) − φ C ( x ) − ⟨ d , z x ⟩ , g_{\mathbf{x}}(\mathbf{d})\equiv\varphi_C(\mathbf{x+d})-\varphi_C(\mathbf{x})-\langle\mathbf{d,z_x}\rangle, gx(d)φC(x+d)φC(x)d,zx,其中 z x = x − P C ( x ) \mathbf{z_x}=\mathbf{x}-P_C(\mathbf{x}) zx=xPC(x). 由梯度的定义, 我们只需证明当 d → 0 \mathbf{d}\to\mathbf{0} d0, g x ( d ) ∥ d ∥ → 0. \frac{g_{\mathbf{x}}(\mathbf{d})}{\Vert\mathbf{d}\Vert}\to0. dgx(d)0.注意到正交投影的定义, 我们对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE ∥ x + d − P C ( x + d ) ∥ 2 ≤ ∥ x + d − P C ( x ) ∥ 2 , \Vert\mathbf{x+d}-P_C(\mathbf{x+d})\Vert^2\le\Vert\mathbf{x+d}-P_C(\mathbf{x})\Vert^2, x+dPC(x+d)2x+dPC(x)2,从而 g x ( d ) = 1 2 ∥ x + d − P C ( x + d ) ∥ 2 − 1 2 ∥ x − P C ( x ) ∥ 2 − ⟨ d , z x ⟩ ≤ 1 2 ∥ x + d − P C ( x ) ∥ 2 − 1 2 ∥ x − P C ( x ) ∥ 2 − ⟨ d , z x ⟩ = 1 2 ∥ d ∥ 2 . \begin{aligned}g_{\mathbf{x}}(\mathbf{d})&=\frac{1}{2}\Vert\mathbf{x+d}-P_C(\mathbf{x+d})\Vert^2-\frac{1}{2}\Vert\mathbf{x}-P_C(\mathbf{x})\Vert^2-\langle\mathbf{d,z_x}\rangle\\&\le\frac{1}{2}\Vert\mathbf{x+d}-P_C(\mathbf{x})\Vert^2-\frac{1}{2}\Vert\mathbf{x}-P_C(\mathbf{x})\Vert^2-\langle\mathbf{d,z_x}\rangle\\&=\frac{1}{2}\Vert\mathbf{d}\Vert^2.\end{aligned} gx(d)=21x+dPC(x+d)221xPC(x)2d,zx21x+dPC(x)221xPC(x)2d,zx=21d2.特别地, 我们有 g x ( − d ) ≤ 1 2 ∥ d ∥ 2 . g_{\mathbf{x}}(-\mathbf{d})\le\frac{1}{2}\Vert\mathbf{d}\Vert^2. gx(d)21d2.可以证明 d C ( x ) d_C(\mathbf{x}) dC(x)是凸函数4. 从而 g x ( d ) g_{\mathbf{x}}(\mathbf{d}) gx(d)也是凸函数. 所以由Jensen不等式, 0 = g x ( 0 ) = g x ( d + ( − d ) 2 ) ≤ 1 2 g x ( d ) + 1 2 g x ( − d ) . 0=g_{\mathbf{x}}(\mathbf{0})=g_{\mathbf{x}}\left(\frac{\mathbf{d+(-d)}}{2}\right)\le\frac{1}{2}g_{\mathbf{x}}(\mathbf{d})+\frac{1}{2}g_{\mathbf{x}}(-\mathbf{d}). 0=gx(0)=gx(2d+(d))21gx(d)+21gx(d).移项可得 g x ( d ) ≥ g x ( − d ) ≥ − 1 2 ∥ d ∥ 2 . g_{\mathbf{x}}(\mathbf{d})\ge g_{\mathbf{x}}(-\mathbf{d})\ge-\frac{1}{2}\Vert\mathbf{d}\Vert^2. gx(d)gx(d)21d2.结合之前的不等式, 可知 ∣ g x ( d ) ∣ ≤ 1 2 ∥ d ∥ 2 ⇒ g x ( d ) ∥ d ∥ → 0. \left|g_{\mathbf{x}}(\mathbf{d})\right|\le\frac{1}{2}\Vert\mathbf{d}\Vert^2\Rightarrow\frac{g_{\mathbf{x}}(\mathbf{d})}{\Vert\mathbf{d}\Vert}\to0. gx(d)21d2dgx(d)0.证毕.

注4: 定义4所定义的梯度是依赖于空间中所选取的内积的. 这与“经典”的梯度不同. 下面我们给出在两种不同内积下得到的梯度的形式. 设 E = R n \mathbb{E}=\mathbb{R}^n E=Rn.
(i) 内积是点积. 于是 ( ∇ f ( x ) ) i = ⟨ ∇ f ( x ) , e i ⟩ = f ′ ( x ; e i ) . (\nabla f(\mathbf{x}))_i=\langle\nabla f(\mathbf{x}),\mathbf{e}_i\rangle=f'(\mathbf{x};\mathbf{e}_i). (f(x))i=f(x),ei=f(x;ei). ∇ f ( x ) \nabla f(\mathbf{x}) f(x)的第 i i i个分量就是 ∂ f ∂ x i ( x ) = f ′ ( x ; e i ) \frac{\partial f}{\partial x_i}(\mathbf{x})=f'(\mathbf{x;e}_i) xif(x)=f(x;ei). 从而此时 ∇ f ( x ) = D f ( x ) , D f ( x ) ≡ ( ∂ f ∂ x 1 ( x ) ∂ f ∂ x 2 ( x ) ⋯ ∂ f ∂ x n ( x ) ) T . \nabla f(\mathbf{x})=D_f(\mathbf{x}),\quad D_f(\mathbf{x})\equiv\begin{pmatrix}\frac{\partial f}{\partial x_1}(\mathbf{x}) & \frac{\partial f}{\partial x_2}(\mathbf{x}) & \cdots & \frac{\partial f}{\partial x_n}(\mathbf{x})\end{pmatrix}^T. f(x)=Df(x),Df(x)(x1f(x)x2f(x)xnf(x))T.注意方向导数的定义并不依赖于内积的选取, 所以特别地在内积为点积的情形下, 由定理11, 可得 f ′ ( x ; d ) = D f ( x ) T d = ∑ i = 1 n ∂ f ∂ x i ( x ) d i . f'(\mathbf{x;d})=D_f(\mathbf{x})^T\mathbf{d}=\sum_{i=1}^n\frac{\partial f}{\partial x_i}(\mathbf{x})d_i. f(x;d)=Df(x)Td=i=1nxif(x)di.(ii) 内积定义为 ⟨ x , y ⟩ = x T H y , \langle\mathbf{x},\mathbf{y}\rangle=\mathbf{x}^T\mathbf{Hy}, x,y=xTHy,这里 H \mathbf{H} H为给定的 n × n n\times n n×n阶正定矩阵. 此时在考察 ∇ f ( x ) \nabla f(\mathbf{x}) f(x)的第 i i i个分量: ( ∇ f ( x ) ) i = ∇ f ( x ) T e i = ∇ f ( x ) T H ( H − 1 e i ) = ⟨ ∇ f ( x ) , H − 1 e i ⟩ = f ′ ( x ; H − 1 e i ) = D f ( x ) T H − 1 e i . \begin{aligned}(\nabla f(\mathbf{x}))_i&=\nabla f(\mathbf{x})^T\mathbf{e}_i\\&=\nabla f(\mathbf{x})^T\mathbf{H}(\mathbf{H}^{-1}\mathbf{e}_i)\\&=\langle\nabla f(\mathbf{x}),\mathbf{H}^{-1}\mathbf{e}_i\rangle\\&=f'(\mathbf{x};\mathbf{H}^{-1}\mathbf{e}_i)\\&=D_f(\mathbf{x})^T\mathbf{H}^{-1}\mathbf{e}_i.\end{aligned} (f(x))i=f(x)Tei=f(x)TH(H1ei)=f(x),H1ei=f(x;H1ei)=Df(x)TH1ei.此时的梯度是经典的梯度经加权后得到的: ∇ f ( x ) = H − 1 D f ( x ) . \nabla f(\mathbf{x})=\mathbf{H}^{-1}D_f(\mathbf{x}). f(x)=H1Df(x).
E = R m × n \mathbb{E}=\mathbb{R}^{m\times n} E=Rm×n, 则有类似的结论:
(i) 内积是点积: ⟨ x , y ⟩ = T r ( x T y ) , ∀ x , y ∈ R m × n . \langle\mathbf{x,y}\rangle=\mathrm{Tr}(\mathbf{x}^T\mathbf{y}),\quad\forall\mathbf{x,y}\in\mathbb{R}^{m\times n}. x,y=Tr(xTy),x,yRm×n.给定一正常函数 f : R m × n → ( − ∞ , ∞ ] f:\mathbb{R}^{m\times n}\to(-\infty,\infty] f:Rm×n(,], x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)), 若其在 x \mathbf{x} x处可微, 则在 x \mathbf{x} x处的梯度为 ∇ f ( x ) = D f ( x ) \nabla f(\mathbf{x})=D_f(\mathbf{x}) f(x)=Df(x), 这里 D f ( x ) D_f(\mathbf{x}) Df(x) m × n m\times n m×n矩阵 D f ( x ) = ( ∂ f ∂ x i j ( x ) ) i , j . D_f(\mathbf{x})=\left(\frac{\partial f}{\partial x_{ij}}(\mathbf{x})\right)_{i,j}. Df(x)=(xijf(x))i,j.(ii) 内积定义为 ⟨ x , y ⟩ = T r ( x T H y ) , \langle\mathbf{x,y}\rangle=\mathrm{Tr}(\mathbf{x}^T\mathbf{Hy}), x,y=Tr(xTHy),这里 H \mathbf{H} H m × m m\times m m×m的正定阵. 则 ∇ f ( x ) = H − 1 D f ( x ) . \nabla f(\mathbf{x})=\mathbf{H}^{-1}D_f(\mathbf{x}). f(x)=H1Df(x).
可微性与次微分是紧密联系的.
定理12 (可微点处的次微分) 设 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to(-\infty,\infty] f:E(,]为一正常凸函数, x ∈ i n t ( d o m ( f ) ) \mathbf{x}\in\mathrm{int}(\mathrm{dom}(f)) xint(dom(f)). 则 f f f x \mathbf{x} x处可微当且仅当 ∂ f ( x ) \partial f(\mathbf{x}) f(x)是单点集, 且此时 ∂ f ( x ) = { ∇ f ( x ) } \partial f(\mathbf{x})=\{\nabla f(\mathbf{x})\} f(x)={f(x)}.

证明: (必要性) 设 f f f x \mathbf{x} x处可微. 则由定理11, 对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE, f ′ ( x ; d ) = ⟨ ∇ f ( x ) , d ⟩ . f'(\mathbf{x;d})=\langle\nabla f(\mathbf{x}),\mathbf{d}\rangle. f(x;d)=f(x),d.任取 g ∈ ∂ f ( x ) \mathbf{g}\in\partial f(\mathbf{x}) gf(x). 下证 g = ∇ f ( x ) \mathbf{g}=\nabla f(\mathbf{x}) g=f(x). 由极大公式 (定理10) 可知 ⟨ ∇ f ( x ) , d ⟩ = f ′ ( x ; d ) ≥ ⟨ g , d ⟩ , \langle\nabla f(\mathbf{x}),\mathbf{d}\rangle=f'(\mathbf{x;d})\ge\langle\mathbf{g},\mathbf{d}\rangle, f(x),d=f(x;d)g,d,从而 ⟨ g − ∇ f ( x ) , d ⟩ ≤ 0 , ∀ d ∈ E . \langle\mathbf{g}-\nabla f(\mathbf{x}),\mathbf{d}\rangle\le0,\quad\forall\mathbf{d}\in\mathbb{E}. gf(x),d0,dE.从而上式对 ∀ d : ∥ d ∥ ≤ 1 \forall\mathbf{d}:\Vert\mathbf{d}\Vert\le1 d:d1成立, 于是 ∥ g − ∇ f ( x ) ∥ ∗ ≤ 0 \Vert\mathbf{g}-\nabla f(\mathbf{x})\Vert_*\le0 gf(x)0. 所以 g = ∇ f ( x ) \mathbf{g}=\nabla f(\mathbf{x}) g=f(x). 再由定理3, 可知 ∂ f ( x ) \partial f(\mathbf{x}) f(x)非空. 因此 ∂ f ( x ) = { ∇ f ( x ) } \partial f(\mathbf{x})=\{\nabla f(\mathbf{x})\} f(x)={f(x)}.

(充分性) 假设 f f f x \mathbf{x} x处的次微分是单点集. 设 g \mathbf{g} g f f f x \mathbf{x} x处的唯一次梯度. 作辅助函数 h ( u ) = f ( x + u ) − f ( x ) − ⟨ g , u ⟩ . h(\mathbf{u})=f(\mathbf{x+u})-f(\mathbf{x})-\langle\mathbf{g,u}\rangle. h(u)=f(x+u)f(x)g,u.我们只需证明在 u → 0 \mathbf{u}\to\mathbf{0} u0时, 有 h ( u ) ∥ u ∥ → 0. \frac{h(\mathbf{u})}{\Vert\mathbf{u}\Vert}\to0. uh(u)0.易验证 0 \mathbf{0} 0 h h h 0 \mathbf{0} 0处的唯一次梯度. 事实上, 任取 y ∈ ∂ h ( 0 ) \mathbf{y}\in\partial h(\mathbf{0}) yh(0), 有 f ( x + z ) − f ( x ) − ⟨ g , z ⟩ = h ( z ) ≥ h ( 0 ) + ⟨ y , z ⟩ = ⟨ y , z ⟩ , ∀ z ∈ E . f(\mathbf{x+z})-f(\mathbf{x})-\langle\mathbf{g},\mathbf{z}\rangle=h(\mathbf{z})\ge h(\mathbf{0})+\langle\mathbf{y,z}\rangle=\langle\mathbf{y,z}\rangle,\quad\forall\mathbf{z}\in\mathbb{E}. f(x+z)f(x)g,z=h(z)h(0)+y,z=y,z,zE.从而 f ( x + z ) ≥ f ( x ) + ⟨ g + y , z ⟩ , ∀ z ∈ E . f(\mathbf{x+z})\ge f(\mathbf{x})+\langle\mathbf{g+y,z}\rangle,\quad\forall\mathbf{z}\in\mathbb{E}. f(x+z)f(x)+g+y,z,zE.这说明 g + y ∈ ∂ f ( x ) \mathbf{g+y}\in\partial f(\mathbf{x}) g+yf(x). 但 ∂ f ( x ) = { g } \partial f(\mathbf{x})=\{\mathbf{g}\} f(x)={g}, 所以 y = 0 \mathbf{y}=\mathbf{0} y=0.
由于 0 ∈ i n t ( d o m ( h ) ) \mathbf{0}\in\mathrm{int}(\mathrm{dom}(h)) 0int(dom(h)), 所以由极大公式 (定理10) 知对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE, h ′ ( 0 ; d ) = σ ∂ h ( 0 ) ( d ) = 0. h'(\mathbf{0;d})=\sigma_{\partial h(\mathbf{0})}(\mathbf{d})=0. h(0;d)=σh(0)(d)=0.于是对 ∀ d ∈ E \forall\mathbf{d}\in\mathbb{E} dE, 0 = h ′ ( 0 ; d ) = lim ⁡ α → 0 + h ( α d ) − h ( 0 ) α = lim ⁡ α → 0 + h ( α d ) α . 0=h'(\mathbf{0;d})=\lim_{\alpha\to0^+}\frac{h(\alpha\mathbf{d})-h(\mathbf{0})}{\alpha}=\lim_{\alpha\to0^+}\frac{h(\alpha\mathbf{d})}{\alpha}. 0=h(0;d)=α0+limαh(αd)h(0)=α0+limαh(αd).注意上述式子与我们的目标不同: 上式中 d \mathbf{d} d是固定的, 相当于是要证的目标式子中沿一条固定射线趋于原点的情形. 为证明目标式子, 我们需要利用 0 \mathbf{0} 0 h h h有效域内点这一事实.
{ v 1 , v 2 , … , v k } \{\mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_k\} {v1,v2,,vk} E \mathbb{E} E的一组标准正交基. 由于 0 ∈ i n t ( d o m ( h ) ) \mathbf{0}\in\mathrm{int}(\mathrm{dom}(h)) 0int(dom(h)), 所以存在 ϵ ∈ ( 0 , 1 ) \epsilon\in(0,1) ϵ(0,1)使得 ϵ v i , − ϵ v i ∈ d o m ( h ) ,   i = 1 , 2 , … , k \epsilon\mathbf{v}_i,-\epsilon\mathbf{v}_i\in\mathrm{dom}(h),\,i=1,2,\ldots,k ϵvi,ϵvidom(h),i=1,2,,k. 因 d o m ( h ) \mathrm{dom}(h) dom(h)是凸集, 于是凸包 D = c o n v ( { ± ϵ v i } i = 1 k ) ⊂ d o m ( h ) D=\mathrm{conv}\left(\{\pm\epsilon\mathbf{v}_i\}_{i=1}^k\right)\subset\mathrm{dom}(h) D=conv({±ϵvi}i=1k)dom(h). 设 ∥ ⋅ ∥ \Vert\cdot\Vert E \mathbb{E} E中的欧式范数. 注意到 B ∥ ⋅ ∥ [ 0 , γ ] ⊂ D B_{\Vert\cdot\Vert}[\mathbf{0},\gamma]\subset D B[0,γ]D, 其中 γ = ϵ k \gamma=\frac{\epsilon}{k} γ=kϵ. 事实上, 任取 w ∈ B ∥ ⋅ ∥ [ 0 , γ ] \mathbf{w}\in B_{\Vert\cdot\Vert}[\mathbf{0},\gamma] wB[0,γ]. 于是 w = ∑ i = 1 k ⟨ w , v i ⟩ v i , ∥ w ∥ 2 = ∑ i = 1 k ⟨ w , v i ⟩ 2 . \mathbf{w}=\sum_{i=1}^k\langle\mathbf{w},\mathbf{v}_i\rangle\mathbf{v}_i,\quad\Vert\mathbf{w}\Vert^2=\sum_{i=1}^k\langle\mathbf{w},\mathbf{v}_i\rangle^2. w=i=1kw,vivi,w2=i=1kw,vi2.因为 ∥ w ∥ ≤ γ \Vert\mathbf{w}\Vert\le\gamma wγ, 因此由Parseval等式, ∣ ⟨ w , v i ⟩ ∣ ≤ γ |\langle\mathbf{w},\mathbf{v}_i\rangle|\le\gamma w,viγ. 于是 w = ∑ i = 1 k ⟨ w , v i ⟩ = ∑ i = 1 k ∣ ⟨ w , v i ⟩ ∣ ϵ [ s g n ( ⟨ w , v i ⟩ ) ϵ v i ] + ( 1 − ∑ i = 1 k ∣ ⟨ w , v i ⟩ ∣ ϵ ) ⋅ 0 ∈ D . \mathbf{w}=\sum_{i=1}^k\langle\mathbf{w,v}_i\rangle=\sum_{i=1}^k\frac{|\langle\mathbf{w,v}_i\rangle|}{\epsilon}[\mathrm{sgn}(\langle\mathbf{w,v}_i\rangle)\epsilon\mathbf{v}_i]+\left(1-\sum_{i=1}^k\frac{|\langle\mathbf{w,v}_i\rangle|}{\epsilon}\right)\cdot\mathbf{0}\in D. w=i=1kw,vi=i=1kϵw,vi[sgn(⟨w,vi⟩)ϵvi]+(1i=1kϵw,vi)0D.注意上式中 1 − ∑ i = 1 k ∣ ⟨ w , v i ⟩ ∣ ϵ ≥ 0 1-\sum_{i=1}^k\frac{|\langle\mathbf{w,v}_i\rangle|}{\epsilon}\ge0 1i=1kϵw,vi0. 因此就有 B ∥ ⋅ ∥ [ 0 , γ ] ⊂ D B_{\Vert\cdot\Vert}[\mathbf{0},\gamma]\subset D B[0,γ]D. 记 2 k 2k 2k个向量 { ± ϵ v i } i = 1 k \{\pm\epsilon\mathbf{v}_i\}_{i=1}^k {±ϵvi}i=1k z 1 , z 2 , … , z 2 k \mathbf{z}_1,\mathbf{z}_2,\ldots,\mathbf{z}_{2k} z1,z2,,z2k. 任取 0 ≠ u ∈ B ∥ ⋅ ∥ [ 0 , γ 2 ] \mathbf{0}\ne\mathbf{u}\in B_{\Vert\cdot\Vert}[0,\gamma^2] 0=uB[0,γ2]. 我们有 γ u ∥ u ∥ ∈ B ∥ ⋅ ∥ [ 0 , γ ] ⊂ D \gamma\frac{\mathbf{u}}{\Vert\mathbf{u}\Vert}\in B_{\Vert\cdot\Vert}[0,\gamma]\subset D γuuB[0,γ]D, 从而存在 λ ∈ Δ 2 k \bm{\lambda}\in\Delta_{2k} λΔ2k使得 γ u ∥ u ∥ = ∑ i = 1 2 k λ i z i . \gamma\frac{\mathbf{u}}{\Vert\mathbf{u}\Vert}=\sum_{i=1}^{2k}\lambda_i\mathbf{z}_i. γuu=i=12kλizi.因此 h ( u ) ∥ u ∥ = h ( ∥ u ∥ γ γ u ∥ u ∥ ) ∥ u ∥ = h ( ∑ i = 1 2 k λ i ∥ u ∥ γ z i ) ∥ u ∥ ≤ ∑ i = 1 2 k λ i h ( ∥ u ∥ z i γ ) ∥ u ∥ ≤ max ⁡ i = 1 , 2 , … , 2 k { h ( ∥ u ∥ z i γ ) ∥ u ∥ } . \begin{aligned}\frac{h(\mathbf{u})}{\Vert\mathbf{u}\Vert}&=\frac{h\left(\frac{\Vert\mathbf{u}\Vert}{\gamma}\gamma\frac{\mathbf{u}}{\Vert\mathbf{u}\Vert}\right)}{\Vert\mathbf{u}\Vert}=\frac{h\left(\sum_{i=1}^{2k}\lambda_i\frac{\Vert\mathbf{u}\Vert}{\gamma}\mathbf{z}_i\right)}{\Vert\mathbf{u}\Vert}\\&\le\sum_{i=1}^{2k}\lambda_i\frac{h\left(\Vert\mathbf{u}\Vert\frac{\mathbf{z}_i}{\gamma}\right)}{\Vert\mathbf{u}\Vert}\\&\le\max_{i=1,2,\ldots,2k}\left\{\frac{h\left(\Vert\mathbf{u}\Vert\frac{\mathbf{z}_i}{\gamma}\right)}{\Vert\mathbf{u}\Vert}\right\}.\end{aligned} uh(u)=uh(γuγuu)=uh(i=12kλiγuzi)i=12kλiuh(uγzi)i=1,2,,2kmax uh(uγzi) .由已证的射线形式, 可推出 lim ⁡ u → 0 h ( ∥ u ∥ z i γ ) ∥ u ∥ = lim ⁡ ∥ u ∥ → 0 h ( ∥ u ∥ z i γ ) ∥ u ∥ = lim ⁡ α → 0 + h ( α z i γ ) α = 0 , \lim_{\mathbf{u}\to\mathbf{0}}\frac{h\left(\Vert\mathbf{u}\Vert\frac{\mathbf{z}_i}{\gamma}\right)}{\Vert\mathbf{u}\Vert}=\lim_{\Vert\mathbf{u}\Vert\to0}\frac{h\left(\Vert\mathbf{u}\Vert\frac{\mathbf{z}_i}{\gamma}\right)}{\Vert\mathbf{u}\Vert}=\lim_{\alpha\to0^+}\frac{h\left(\alpha\frac{\mathbf{z}_i}{\gamma}\right)}{\alpha}=0, u0limuh(uγzi)=u0limuh(uγzi)=α0+limαh(αγzi)=0,从而当 u → 0 \mathbf{u}\to\mathbf{0} u0, h ( u ) ∥ u ∥ → 0. \frac{h(\mathbf{u})}{\Vert\mathbf{u}\Vert}\to0. uh(u)0.证毕.

例10 ( ℓ 2 \ell_2 2-范数的次微分) 设 f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR定义为 f ( x ) = ∥ x ∥ 2 f(\mathbf{x})=\Vert\mathbf{x}\Vert_2 f(x)=x2. f f f 0 \mathbf{0} 0处的次微分已在例1中讨论了. 而当 x ≠ 0 \mathbf{x\ne0} x=0, f f f x \mathbf{x} x处可微且梯度为 x ∥ x ∥ 2 \frac{\mathbf{x}}{\Vert\mathbf{x}\Vert_2} x2x5 利用定理12, 我们可以写出 f f f次微分的完整刻画: ∂ f ( x ) = { { x ∥ x ∥ 2 } , x ≠ 0 , B ∥ ⋅ ∥ 2 [ 0 , 1 ] , x = 0 . \boxed{\partial f(\mathbf{x})=\left\{\begin{array}{ll}\left\{\frac{\mathbf{x}}{\Vert\mathbf{x}\Vert_2}\right\}, & \mathbf{x\ne0},\\B_{\Vert\cdot\Vert_2}[\mathbf{0},1], & \mathbf{x=0}.\end{array}\right.} f(x)={{x2x},B2[0,1],x=0,x=0.特别地, 在 n = 1 n=1 n=1的情形, 我们有一维函数 g ( x ) = ∣ x ∣ g(x)=|x| g(x)=x的次微分: ∂ g ( x ) = { { s g n ( x ) } , x ≠ 0 , [ − 1 , 1 ] , x = 0. \partial g(x)=\left\{\begin{array}{ll}\{\mathrm{sgn}(x)\}, & x\ne0,\\ [-1,1], & x=0.\end{array}\right. g(x)={{sgn(x)},[1,1],x=0,x=0.


  1. 这里唯一性可这样得出: 设 g 1 , g 2 \mathbf{g}_1,\mathbf{g}_2 g1,g2均满足极限式. 则相减后可得 lim ⁡ h → 0 ⟨ g 1 − g 2 , h ⟩ ∥ h ∥ = 0 \lim_{\mathbf{h\to0}}\frac{\langle\mathbf{g}_1-\mathbf{g}_2,\mathbf{h}\rangle}{\Vert\mathbf{h}\Vert}=0 limh0hg1g2,h=0.取 h = ϵ e i \mathbf{h}=\epsilon \mathbf{e}_i h=ϵei, ϵ → 0 \epsilon\to0 ϵ0即可推出 ( g 1 − g 2 ) i = 0 \left(\mathbf{g}_1-\mathbf{g}_2\right)_i=0 (g1g2)i=0. 取遍 i i i, 即得 g 1 = g 2 \mathbf{g}_1=\mathbf{g}_2 g1=g2. ↩︎

  2. 这里的“梯度”与数学分析中学到的“梯度”不尽相同. 二者具有契合点. 存在区别的根本原因是可微性的定义不同. 这在后文会详细介绍. ↩︎

  3. 在集合 C C C非空闭凸时, 易验证 P C P_C PC是良定的. ↩︎

  4. 事实上, 对 ∀ x 1 , x 2 ∈ E ,   λ ∈ [ 0 , 1 ] \forall\mathbf{x}_1,\mathbf{x}_2\in\mathbb{E},\,\lambda\in[0,1] x1,x2E,λ[0,1], 注意到 min ⁡ y ∈ C ∥ y − [ λ x 1 + ( 1 − λ ) x 2 ] ∥ ≥ λ min ⁡ y ∈ C ∥ y − x 1 ∥ + ( 1 − λ ) min ⁡ y ∈ C ∥ y − x 1 ∥ , \begin{aligned}\min_{\mathbf{y}\in C}\Vert\mathbf{y}-[\lambda\mathbf{x}_1+(1-\lambda)\mathbf{x}_2]\Vert\ge\lambda\min_{\mathbf{y}\in C}\Vert\mathbf{y}-\mathbf{x}_1\Vert+(1-\lambda)\min_{\mathbf{y}\in C}\Vert\mathbf{y}-\mathbf{x}_1\Vert,\end{aligned} yCminy[λx1+(1λ)x2]λyCminyx1+(1λ)yCminyx1,所以 ∥ λ x 1 + ( 1 − λ ) x 2 − P C ( λ x 1 + ( 1 − λ ) x 2 ) ∥ ≤ λ ∥ x 1 − P C ( x 1 ) ∥ + ( 1 − λ ) ∥ x 2 − P C ( x 2 ) ∥ . \begin{aligned}&\Vert\lambda\mathbf{x}_1+(1-\lambda)\mathbf{x}_2-P_C(\lambda\mathbf{x}_1+(1-\lambda)\mathbf{x}_2)\Vert\\&\le\lambda\Vert\mathbf{x}_1-P_C(\mathbf{x}_1)\Vert+(1-\lambda)\Vert\mathbf{x}_2-P_C(\mathbf{x}_2)\Vert.\end{aligned} λx1+(1λ)x2PC(λx1+(1λ)x2)λx1PC(x1)+(1λ)x2PC(x2)∥.进而 d C ( λ x 1 + ( 1 − λ ) x 2 ) = ∥ λ x 1 + ( 1 − λ ) x 2 − P C ( λ x 1 + ( 1 − λ ) x 2 ) ∥ ≤ λ ∥ x 1 − P C ( x 1 ) ∥ + ( 1 − λ ) ∥ x 2 − P C ( x 2 ) ∥ = λ d C ( x 1 ) + ( 1 − λ ) d C ( x 2 ) . \begin{aligned}d_C(\lambda\mathbf{x}_1+(1-\lambda)\mathbf{x}_2)&=\Vert\lambda\mathbf{x}_1+(1-\lambda)\mathbf{x}_2-P_C(\lambda\mathbf{x}_1+(1-\lambda)\mathbf{x}_2)\Vert\\&\le\lambda\Vert\mathbf{x}_1-P_C(\mathbf{x}_1)\Vert+(1-\lambda)\Vert\mathbf{x}_2-P_C(\mathbf{x}_2)\Vert\\&=\lambda d_C(\mathbf{x}_1)+(1-\lambda)d_C(\mathbf{x}_2).\end{aligned} dC(λx1+(1λ)x2)=λx1+(1λ)x2PC(λx1+(1λ)x2)λx1PC(x1)+(1λ)x2PC(x2)=λdC(x1)+(1λ)dC(x2). ↩︎

  5. 默认空间为欧式空间. ↩︎

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值