等式约束问题
问题形式:
min f ( x ) , x ∈ R n s.t. h i ( x ) = 0 , i = 1 , 2 , ⋯ , l (1) \begin{array}{ll} \min f(\boldsymbol{x}), & \boldsymbol{x} \in \mathbf{R}^{n} \\ \text { s.t. } h_{i}(\boldsymbol{x})=0, & i=1,2, \cdots, l \end{array} \qquad \tag {1} minf(x), s.t. hi(x)=0,x∈Rni=1,2,⋯,l(1)
做问题(1)的拉格朗日函数:
L ( x , λ ) = f ( x ) − ∑ i = 1 l λ i h i ( x ) L(\boldsymbol{x}, \boldsymbol{\lambda})=f(\boldsymbol{x})-\sum_{i=1}^{l} \lambda_{i} h_{i}(\boldsymbol{x}) L(x,λ)=f(x)−i=1∑lλihi(x)
其中, λ = ( λ 1 , λ 2 , ⋯ , λ l ) T \lambda = (\lambda_1,\lambda_2,\cdots,\lambda_l)^T λ=(λ1,λ2,⋯,λl)T为乘子向量。
等式的KKT条件
问题(1)取极小值的一阶必要条件,也就是通常所说的KKT条件(Karush-Kuhn-Tucker条件):
定理 1 设问题(1)的局部极小点为: x ∗ x^* x∗,函数 f ( x ) 和 h i ( x ) ( i = 1 , 2 , ⋯ , l ) f(x)和h_i(x)(i=1,2,\cdots,l) f(x)和hi(x)(i=1,2,⋯,l)在 x ∗ x^* x∗的某邻域连续可微,向量组 ∇ h i ( x ∗ ) \nabla h_i(x^*) ∇hi(x∗)线性无关,则存在乘子向量 λ = ( λ 1 , λ 2 , ⋯ , λ l ) T \lambda = (\lambda_1,\lambda_2,\cdots,\lambda_l)^T λ=(λ1,λ2,⋯,λl)T使得:
∇ x L ( x ∗ , λ ∗ ) = 0 \nabla_{x} L\left(\boldsymbol{x}^{*}, \boldsymbol{\lambda}^{*}\right)=\mathbf{0} ∇xL(x∗,λ∗)=0
即:
∇ f ( x ∗ ) − ∑ i = 1 l λ i ∗ ∇ h i ( x ∗ ) = 0 \nabla f\left(\boldsymbol{x}^{*}\right)-\sum_{i=1}^{l} \lambda_{i}^{*} \nabla h_{i}\left(\boldsymbol{x}^{*}\right)=\mathbf{0} ∇f(x∗)−i=1∑lλi∗∇hi(x∗)=0
问题(1)取极小值的二阶必要条件,需用到(2)式的拉格朗日函数的梯度和Hesse矩阵,即;
∇ L ( x , λ ) = ( ∇ x L ( x , λ ) ∇ λ L ( x , λ ) ) = ( ∇ f ( x ) − ∑ i = 1 l λ i ∇ h i ( x ) − h ( x ) ) ∇ x x 2 L ( x , λ ) = ∇ 2 f ( x ) − ∑ i = 1 l λ i ∇ 2 h i ( x ) \begin{array}{l} \nabla L(\boldsymbol{x}, \boldsymbol{\lambda})=\left(\begin{array}{c} \nabla_{\boldsymbol{x}} L(\boldsymbol{x}, \boldsymbol{\lambda}) \\ \nabla_{\boldsymbol{\lambda}} L(\boldsymbol{x}, \boldsymbol{\lambda}) \end{array}\right)=\left(\begin{array}{c} \nabla f(\boldsymbol{x})-\sum_{i=1}^{l} \lambda_{i} \nabla h_{i}(\boldsymbol{x}) \\ -\boldsymbol{h}(\boldsymbol{x}) \end{array}\right) \\ \nabla_{\boldsymbol{x} \boldsymbol{x}}^{2} L(\boldsymbol{x}, \boldsymbol{\lambda})=\nabla^{2} f(\boldsymbol{x})-\sum_{i=1}^{l} \lambda_{i} \nabla^{2} h_{i}(\boldsymbol{x}) \end{array} ∇L(x,λ)=(∇xL(x,λ)∇λL(x,λ))=(∇f(x)−∑i=1lλi∇hi(x)−h(x))∇xx2L(x,λ)=∇2f(x)−∑i=1lλi∇2hi(x)
若考虑二阶充分性条件,还需要目标函数和约束函数都是二阶连续可微的。
定理 2 函数 f ( x ) 和 h i ( x ) ( i = 1 , 2 , ⋯ , l ) f(x)和h_i(x)(i=1,2,\cdots,l) f(x)和hi(x)(i=1,2,⋯,l)二阶连续可微,且存在 ( x ∗ , λ ∗ ) ∈ R n × R l (x^*,\lambda^*) \in R^n \times R^l (x∗,λ∗)∈Rn×Rl使得 ∇ L ( x ∗ , λ ∗ ) = 0 \nabla L\left(\boldsymbol{x}^{*}, \boldsymbol{\lambda}^{*}\right)=\mathbf{0} ∇L(x∗,λ∗)=0。对 ∀ d ≠ 0 ∈ R n , ∇ h i ( x ∗ ) T d = 0 ( i = 1 , 2 , . . . l ) \forall d \neq 0 \in R^n,\nabla h_i(x^*)^Td=0(i=1,2,...l) ∀d=0∈Rn,∇hi(x∗)Td=0(i=1,2,...l),均有 d T ∇ x x 2 L ( x ∗ , λ ∗ ) d > 0 d^T \nabla_{\boldsymbol{x} \boldsymbol{x}}^{2}L\left(\boldsymbol{x}^{*}, \boldsymbol{\lambda}^{*}\right) d > 0 dT∇xx2L(x∗,λ∗)