Introduction to nonlinear optimization第五章习题

5.1. Find without MATLAB the Cholesky factorization of the matrix
A = ( 1 2 4 7 2 13 23 38 4 23 77 122 7 38 122 294 ) \mathbf{A}=\left(\begin{array}{cccc} 1 & 2 & 4 & 7 \\ 2 & 13 & 23 & 38 \\ 4 & 23 & 77 & 122 \\ 7 & 38 & 122 & 294\\ \end{array}\right) A=1247213233842377122738122294
解:

L = ( l 11 0 0 0 l 21 l 22 0 0 l 31 l 32 l 33 0 l 41 l 42 l 43 l 44 ) \mathbf{L}=\begin{pmatrix} l_{11}&0&0&0\\ l_{21}&l_{22}&0&0\\ l_{31}&l_{32}&l_{33}&0\\ l_{41}&l_{42}&l_{43}&l_{44}\\ \end{pmatrix} L=l11l21l31l410l22l32l4200l33l43000l44
l 11 = a 11 = 1 l_{11}=\sqrt{a_{11}}=1 l11=a11 =1
( l 21 l 31 l 41 ) = 1 1 ( 2 4 7 ) = ( 2 4 7 ) \begin{pmatrix} l_{21}\\ l_{31}\\ l_{41}\\ \end{pmatrix}=\frac{1}{\sqrt{1}}\begin{pmatrix} 2\\ 4\\ 7\\ \end{pmatrix}=\begin{pmatrix} 2\\ 4\\ 7\\ \end{pmatrix} l21l31l41=1 1247=247
L ′ = ( l 22 0 0 l 32 l 33 0 l 42 l 43 l 44 ) \mathbf{L}'=\begin{pmatrix} l_{22}&0&0\\ l_{32}&l_{33}&0\\ l_{42}&l_{43}&l_{44}\\ \end{pmatrix} L=l22l32l420l33l4300l44
L ′ L ′ T = ( 13 23 38 23 77 122 38 122 294 ) − 1 1 ( 2 4 7 ) ( 2 4 7 ) T = ( 9 15 24 15 61 94 24 94 245 ) \mathbf{L}'\mathbf{L}'^T=\begin{pmatrix} 13 & 23 & 38 \\ 23 & 77 & 122 \\ 38 & 122 & 294\\ \end{pmatrix}-\frac{1}{1}\begin{pmatrix}2\\ 4\\ 7\\\end{pmatrix}\begin{pmatrix}2\\ 4\\ 7\\\end{pmatrix}^T=\begin{pmatrix} 9&15&24\\ 15&61&94\\ 24&94&245\\ \end{pmatrix} LLT=13233823771223812229411247247T=915241561942494245
l 22 = 9 = 3 l_{22}=\sqrt{9}=3 l22=9 =3
( l 32 l 42 ) = 1 9 ( 15 24 ) = ( 5 8 ) \begin{pmatrix} l_{32}\\ l_{42}\\ \end{pmatrix}=\frac{1}{\sqrt{9}}\begin{pmatrix} 15\\ 24\\ \end{pmatrix}=\begin{pmatrix} 5\\ 8\\ \end{pmatrix} (l32l42)=9 1(1524)=(58)
L ′ ′ = ( l 33 0 l 43 l 44 ) \mathbf{L}''=\begin{pmatrix} l_{33}&0\\ l_{43}&l_{44}\\ \end{pmatrix} L=(l33l430l44)
L ′ ′ L ′ ′ T = ( 61 94 94 245 ) − 1 9 ( 15 24 ) ( 15 24 ) T = ( 36 54 54 181 ) \mathbf{L}''\mathbf{L}''^T=\begin{pmatrix} 61&94\\ 94&245\\ \end{pmatrix}-\frac{1}{9}\begin{pmatrix} 15\\ 24\\ \end{pmatrix}\begin{pmatrix} 15\\ 24\\ \end{pmatrix}^T=\begin{pmatrix} 36&54\\ 54&181\\ \end{pmatrix} LLT=(619494245)91(1524)(1524)T=(365454181)
l 33 = 36 = 6 l_{33}=\sqrt{36}=6 l33=36 =6
l 43 = 54 36 = 9 l_{43}=\frac{54}{\sqrt{36}}=9 l43=36 54=9
l 44 2 = 181 − 54 ⋅ 54 36 = 100 ⇒ l 44 = 10 l_{44}^2=181-\frac{54\cdot 54}{36}=100\Rightarrow l_{44}=10 l442=181365454=100l44=10
所以
( 1 0 0 0 2 3 0 0 4 5 6 0 7 8 9 10 ) \begin{pmatrix} 1&0&0&0\\ 2&3&0&0\\ 4&5&6&0\\ 7&8&9&10\\ \end{pmatrix} 12470358006900010
5.2. Consider the Freudenstein and Roth test function
f ( x ) = f 1 ( x ) 2 + f 2 ( x ) 2 , x ∈ R 2 f\left(\mathbf{x}\right)=f_1\left(\mathbf{x}\right)^2+f_2\left(\mathbf{x}\right)^2,\quad \mathbf{x}\in\mathbb{R}^2 f(x)=f1(x)2+f2(x)2,xR2
where
f 1 ( x ) = − 13 + x 1 + ( ( 5 − x 2 ) x 2 − 2 ) x 2 f 2 ( x ) = − 29 + x 1 + ( ( x 2 + 1 ) x 2 − 14 ) x 2 \begin{aligned} &f_{1}(\mathbf{x})=-13+x_{1}+\left(\left(5-x_{2}\right) x_{2}-2\right) x_{2} \\ &f_{2}(\mathbf{x})=-29+x_{1}+\left(\left(x_{2}+1\right) x_{2}-14\right) x_{2} \end{aligned} f1(x)=13+x1+((5x2)x22)x2f2(x)=29+x1+((x2+1)x214)x2

(i) Show that the function f f f has three stationary points. Find them and prove
that one is a global minimizer, one is a strict local minimum and the third is
a saddle point.
(ii) Use MATLAB to employ the following three methods on the problem of minimizing f:

  1. the gradient method with backtracking and parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
  2. the hybrid Newton’s method with parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
  3. damped Gauss-Newton’s method with a backtracking line search strategy with parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).

All the algorithms should use the stopping criteria ∥ ∇ f ( x ) ∥ ≤ 1 0 − 5 \|\nabla f\left(\mathbf{x}\right)\|\le 10^{-5} f(x)105. Each
algorithm should be em/loyed four times on the following four starting
points: ( − 50 , 7 ) T , ( 20 , 7 ) T , ( 20 , − 18 ) T , ( 5 , − 10 ) T (-50,7)^{T},(20,7)^{T},(20,-18)^{T},(5,-10)^{T} (50,7)T,(20,7)T,(20,18)T,(5,10)T. For each of the four starting points, compare the number of iterations and the point to which each
method converged. If a method did not converge, explain why.
解:
(i)

∂ f 1 ( x ) ∂ x 1 = ∂ f 2 ( x ) ∂ x 1 = 1 \frac{\partial f_1(\boldsymbol{x})}{\partial x_1}=\frac{\partial f_2(\boldsymbol{x})}{\partial x_1}=1 x1f1(x)=x1f2(x)=1
∂ f 1 ( x ) ∂ x 2 = − 3 x 2 2 + 10 x 2 − 2 \frac{\partial f_1(\boldsymbol{x})}{\partial x_2}=-3x_2^2+10x_2-2 x2f1(x)=3x22+10x22
∂ f 2 ( x ) ∂ x 2 = 3 x 2 2 + 2 x 2 − 14 \frac{\partial f_2(\boldsymbol{x})}{\partial x_2}=3x_2^2+2x_2-14 x2f2(x)=3x22+2x214
∂ 2 f 1 ( x ) ∂ x 2 2 = − 6 x 2 + 10 \frac{\partial^2 f_1(\boldsymbol{x})}{\partial x_2^2}=-6x_2+10 x222f1(x)=6x2+10
∂ 2 f 2 ( x ) ∂ x 2 2 = 6 x 2 + 2 \frac{\partial^2 f_2(\boldsymbol{x})}{\partial x_2^2}=6x_2+2 x222f2(x)=6x2+2
∂ f ( x ) ∂ x 1 = 2 ( f 1 ( x ) + f 2 ( x ) ) \frac{\partial f(\boldsymbol{x})}{\partial x_1}=2(f_1(\boldsymbol{x})+f_2(\boldsymbol{x})) x1f(x)=2(f1(x)+f2(x))
∂ f ( x ) ∂ x 2 = 2 ( f 1 ( x ) ∂ f 1 ( x ) ∂ x 2 + f 2 ( x ) ∂ f 2 ( x ) ∂ x 2 ) \frac{\partial f(\boldsymbol{x})}{\partial x_2}=2(f_1(\boldsymbol{x})\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}+f_2(\boldsymbol{x})\frac{\partial f_2(\boldsymbol{x})}{\partial x_2}) x2f(x)=2(f1(x)x2f1(x)+f2(x)x2f2(x))
∇ f ( x ) = 0 ⇒ ∂ f ( x ) ∂ x 1 = ∂ f ( x ) ∂ x 2 = 0 \nabla f(\boldsymbol{x})=0\Rightarrow \frac{\partial f(\boldsymbol{x})}{\partial x_1}=\frac{\partial f(\boldsymbol{x})}{\partial x_2}=0 f(x)=0x1f(x)=x2f(x)=0
所以 f 2 ( x ) = − f 1 ( x ) f_2(\boldsymbol{x})=-f_1(\boldsymbol{x}) f2(x)=f1(x)
∂ f ( x ) ∂ x 2 = 2 ( f 1 ( x ) ∂ f 1 ( x ) ∂ x 2 + f 2 ( x ) ∂ f 2 ( x ) ∂ x 2 ) = 2 f 1 ( x ) ( ∂ f 1 ( x ) ∂ x 2 − ∂ f 2 ( x ) ∂ x 2 ) = 2 f 1 ( x ) ( 3 x 2 2 − 4 x 2 − 6 ) = 0 \begin{aligned} \frac{\partial f(\boldsymbol{x})}{\partial x_2}&=2(f_1(\boldsymbol{x})\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}+f_2(\boldsymbol{x})\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})\\ &=2f_1(\boldsymbol{x})(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}-\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})\\ &=2f_1(\boldsymbol{x})(3x_2^2-4x_2-6)\\ &=0 \end{aligned} x2f(x)=2(f1(x)x2f1(x)+f2(x)x2f2(x))=2f1(x)(x2f1(x)x2f2(x))=2f1(x)(3x224x26)=0
⇒ f 1 ( x ) = f 2 ( x ) = 0 或者 ∂ f 1 ( x ) ∂ x 2 = ∂ f 2 ( x ) ∂ x 2 \Rightarrow f_1(x)=f_2(x)=0 \text{或者} \frac{\partial f_1(\boldsymbol{x})}{\partial x_2}=\frac{\partial f_2(\boldsymbol{x})}{\partial x_2} f1(x)=f2(x)=0或者x2f1(x)=x2f2(x)
1)当 f 1 ( x ) = f 2 ( x ) = 0 f_1(\boldsymbol{x})=f_2(\boldsymbol{x})=0 f1(x)=f2(x)=0
f 1 ( x ) − f 2 ( x ) = 16 + ( − 2 x 2 2 + 4 x 2 + 12 ) x 2 = − 2 ( x 2 − 4 ) ( ( x 2 + 1 ) 2 + 1 ) f_1(\boldsymbol{x})-f_2(\boldsymbol{x})=16+(-2x_2^2+4x_2+12)x_2=-2(x_2-4)((x_2+1)^2+1) f1(x)f2(x)=16+(2x22+4x2+12)x2=2(x24)((x2+1)2+1)
⇒ { x 1 = 5 x 2 = 4 \Rightarrow \begin{cases} x_1=5\\ x_2=4 \end{cases} {x1=5x2=4
f ( x ) ≥ f ( ( 5 , 4 ) T ) = 0 f(\boldsymbol{x})\ge f((5,4)^T)=0 f(x)f((5,4)T)=0
显然是全局最小值
2)当 ∂ f 1 ( x ) ∂ x 2 = ∂ f 2 ( x ) ∂ x 2 \frac{\partial f_1(\boldsymbol{x})}{\partial x_2}=\frac{\partial f_2(\boldsymbol{x})}{\partial x_2} x2f1(x)=x2f2(x)
⇒ 3 x 2 2 − 4 x 2 − 6 = 0 ⇒ x 2 = 2 ± 22 3 \Rightarrow 3x_2^2-4x_2-6=0\Rightarrow x_2=\frac{2\pm \sqrt{22}}{3} 3x224x26=0x2=32±22
f 1 ( x ) + f 2 ( x ) = 0 − 42 + 2 x 1 + ( 5 x 2 − x 2 2 − 2 + x 2 2 + x 2 − 14 ) x 2 = 0 − 42 + 2 x 1 + ( 6 x 2 − 16 ) x 2 = 0 − 21 + x 1 + 3 x 2 2 − 8 x 2 = 0 − 21 + x 1 + 4 x 2 + 6 − 8 x 2 = 0 x 1 = 4 x 2 + 15 x 1 = 53 ± 4 22 3 \begin{aligned} f_1(\boldsymbol{x})+f_2(\boldsymbol{x})&=0\\ -42+2x_1+(5x_2-x_2^2-2+x_2^2+x_2-14)x_2&=0\\ -42+2x_1+(6x_2-16)x_2&=0\\ -21+x_1+3x_2^2-8x_2&=0\\ -21+x_1+4x_2+6-8x_2&=0\\ x_1&=4x_2+15\\ x_1&=\frac{53\pm 4\sqrt{22}}{3} \end{aligned} f1(x)+f2(x)42+2x1+(5x2x222+x22+x214)x242+2x1+(6x216)x221+x1+3x228x221+x1+4x2+68x2x1x1=0=0=0=0=0=4x2+15=353±422
∂ 2 f ( x ) ∂ x 1 2 = 2 ( ∂ f 1 ( x ) ∂ x 1 + ∂ f 2 ( x ) ∂ x 1 ) = 4 \frac{\partial^2 f(\boldsymbol{x})}{\partial x_1^2}=2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_1}+\frac{\partial f_2(\boldsymbol{x})}{\partial x_1})=4 x122f(x)=2(x1f1(x)+x1f2(x))=4
∂ 2 f ( x ) ∂ x 1 ∂ x 2 = ∂ 2 f ( x ) ∂ x 2 ∂ x 1 = 2 ( ∂ f 1 ( x ) ∂ x 2 + ∂ f 2 ( x ) ∂ x 2 ) = 4 ∂ f 1 ( x ) ∂ x 2 = 2 ( 12 x 2 − 16 ) = 8 ( 3 x 2 − 4 ) \begin{aligned} \frac{\partial^2 f(\boldsymbol{x})}{\partial x_1\partial x_2}&=\frac{\partial^2 f(\boldsymbol{x})}{\partial x_2\partial x_1}\\ &=2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}+\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})\\ &=4\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}\\ &=2(12x_2-16)\\ &=8(3x_2-4) \end{aligned} x1x22f(x)=x2x12f(x)=2(x2f1(x)+x2f2(x))=4x2f1(x)=2(12x216)=8(3x24)
∂ 2 f ( x ) ∂ x 2 2 = 2 ( ( ∂ f 1 ( x ) ∂ x 2 ) 2 + f 1 ( x ) ∂ 2 f 1 ( x ) ∂ x 2 2 + ( ∂ f 2 ( x ) ∂ x 2 ) 2 + f 2 ( x ) ∂ 2 f 2 ( x ) ∂ x 2 2 ) = 2 ( 2 ( ∂ f 1 ( x ) ∂ x 2 ) 2 + f 1 ( x ) ( ∂ 2 f 1 ( x ) ∂ x 2 2 − ∂ 2 f 2 ( x ) ∂ x 2 2 ) ) = 2 ( 2 ( ∂ f 1 ( x ) ∂ x 2 ) 2 + f 1 ( x ) ( − 12 x 2 + 8 ) ) = 2 ( 2 ( ∂ f 1 ( x ) ∂ x 2 ) 2 − 4 f 1 ( x ) ( 3 x 2 − 2 ) ) = 4 ( ( ∂ f 1 ( x ) ∂ x 2 ) 2 − 2 f 1 ( x ) ( 3 x 2 − 2 ) ) \begin{aligned} &\quad \frac{\partial^2 f(\boldsymbol{x})}{\partial x_2^2}\\ &=2((\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2+f_1(\boldsymbol{x})\frac{\partial^2 f_1(\boldsymbol{x})}{\partial x_2^2}+(\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})^2+f_2(\boldsymbol{x})\frac{\partial^2 f_2(\boldsymbol{x})}{\partial x_2^2})\\ &=2(2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2+f_1(\boldsymbol{x})(\frac{\partial^2 f_1(\boldsymbol{x})}{\partial x_2^2}-\frac{\partial^2 f_2(\boldsymbol{x})}{\partial x_2^2}))\\ &=2(2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2+f_1(\boldsymbol{x})(-12x_2+8))\\ &=2(2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-4f_1(\boldsymbol{x})(3x_2-2))\\ &=4((\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-2f_1(\boldsymbol{x})(3x_2-2))\\ \end{aligned} x222f(x)=2((x2f1(x))2+f1(x)x222f1(x)+(x2f2(x))2+f2(x)x222f2(x))=2(2(x2f1(x))2+f1(x)(x222f1(x)x222f2(x)))=2(2(x2f1(x))2+f1(x)(12x2+8))=2(2(x2f1(x))24f1(x)(3x22))=4((x2f1(x))22f1(x)(3x22))
∣ ∇ 2 f ( x ) ∣ = ∣ 4 4 ∂ f 1 ( x ) ∂ x 2 4 ∂ f 1 ( x ) ∂ x 2 4 ( ( ∂ f 1 ( x ) ∂ x 2 ) 2 − 2 f 1 ( x ) ( 3 x 2 − 2 ) ) ∣ = 16 ( ∂ f 1 ( x ) ∂ x 2 ) 2 − 32 f 1 ( x ) ( 3 x 2 − 2 ) − 16 ( ∂ f 1 ( x ) ∂ x 2 ) 2 = − 32 f 1 ( x ) ( 3 x 2 − 2 ) \begin{aligned} \left|\nabla^2f(\boldsymbol{x})\right| &= \left| \begin{array}{cccc} 4&4\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}\\ 4\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}&4((\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-2f_1(\boldsymbol{x})(3x_2-2))\\ \end{array} \right|\\ &=16(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-32f_1(\boldsymbol{x})(3x_2-2)-16(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2\\ &=-32f_1(\boldsymbol{x})(3x_2-2) \end{aligned} 2f(x)=44x2f1(x)4x2f1(x)4((x2f1(x))22f1(x)(3x22))=16(x2f1(x))232f1(x)(3x22)16(x2f1(x))2=32f1(x)(3x22)
f 1 ( x ) = − 13 + x 1 + ( ( 5 − x 2 ) x 2 − 2 ) x 2 = − 13 + x 1 + ( 5 x 2 − x 2 2 − 2 ) x 2 = − 13 + x 1 + ( 5 x 2 − 4 x 2 + 6 3 − 2 ) x 2 = − 13 + x 1 + 1 3 ( 15 x 2 − 4 x 2 − 6 − 6 ) x 2 = − 13 + x 1 + 1 3 ( 11 x 2 − 12 ) x 2 = − 13 + x 1 + 1 3 ( 11 x 2 2 − 12 x 2 ) = − 13 + x 1 + 1 3 ( 11 4 x 2 + 6 3 − 12 x 2 ) = − 13 + x 1 + 1 9 ( 44 x 2 + 66 − 36 x 2 ) = − 13 + x 1 + 8 9 x 2 + 22 3 = − 17 3 + x 1 + 8 9 x 2 = 4 27 ( 85 ± 11 22 ) > 0 \begin{aligned} &\quad f_1(\boldsymbol{x})\\ &=-13+x_1+((5-x_2)x_2-2)x_2\\ &=-13+x_1+(5x_2-x_2^2-2)x_2\\ &=-13+x_1+(5x_2-\frac{4x_2+6}{3}-2)x_2\\ &=-13+x_1+\frac{1}{3}(15x_2-4x_2-6-6)x_2\\ &=-13+x_1+\frac{1}{3}(11x_2-12)x_2\\ &=-13+x_1+\frac{1}{3}(11x_2^2-12x_2)\\ &=-13+x_1+\frac{1}{3}(11\frac{4x_2+6}{3}-12x_2)\\ &=-13+x_1+\frac{1}{9}(44x_2+66-36x_2)\\ &=-13+x_1+\frac{8}{9}x_2+\frac{22}{3}\\ &=\frac{-17}{3}+x_1+\frac{8}{9}x_2\\ &=\frac{4}{27}(85\pm 11\sqrt{22})\\ &>0 \end{aligned} f1(x)=13+x1+((5x2)x22)x2=13+x1+(5x2x222)x2=13+x1+(5x234x2+62)x2=13+x1+31(15x24x266)x2=13+x1+31(11x212)x2=13+x1+31(11x2212x2)=13+x1+31(1134x2+612x2)=13+x1+91(44x2+6636x2)=13+x1+98x2+322=317+x1+98x2=274(85±1122 )>0
∣ ∇ 2 f ( x ) ∣ = ∓ 32 22 f 1 ( x ) \left|\nabla^2f(\boldsymbol{x})\right|=\mp 32\sqrt{22}f_1(\boldsymbol{x}) 2f(x)=3222 f1(x)

{ x 1 = 53 + 4 22 3 x 2 = 2 + 22 3 \begin{cases} x_1=\frac{53+ 4\sqrt{22}}{3}\\ x_2=\frac{2+ \sqrt{22}}{3} \end{cases} {x1=353+422 x2=32+22 时, ∣ ∇ 2 f ( x ) ∣ < 0 \left|\nabla^2f(\boldsymbol{x})\right|<0 2f(x)<0,所以特征值有正有负,所以 ∇ 2 f ( x ) \nabla^2f(\boldsymbol{x}) 2f(x)不定,所以是鞍点
{ x 1 = 53 − 4 22 3 x 2 = 2 − 22 3 \begin{cases} x_1=\frac{53- 4\sqrt{22}}{3}\\ x_2=\frac{2- \sqrt{22}}{3} \end{cases} {x1=353422 x2=3222 时, ∣ ∇ 2 f ( x ) ∣ > 0 \left|\nabla^2f(\boldsymbol{x})\right|>0 2f(x)>0,所以 ∇ 2 f ( x ) ≻ 0 \nabla^2f(\boldsymbol{x})\succ 0 2f(x)0,所以是严格局部极小值

(ii)
回溯法

function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
    beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
%             of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
        t=beta*t;
    end
    x=x-t*grad;
    fun_val=f(x);
    grad=g(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end

混合牛顿

function x=newton_hybrid(f,g,h,x0,alpha,beta,epsilon)
% Hybrid Newton’s method
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
%             of min f(x)
% fun_val ... optimal function value
x=x0;
gval=g(x);
hval=h(x);
[L,p]=chol(hval,'lower');
if (p==0)
    d=L'\(L\gval);
else
    d=gval;
end
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))
    iter=iter+1;
    t=1;
    while(f(x-t*d)>f(x)-alpha*t*gval'*d)
        t=beta*t;
    end
    x=x-t*d;
    fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
    gval=g(x);
    hval=h(x);
    [L,p]=chol(hval,'lower');
    if (p==0)
        d=L'\(L\gval);
    else
        d=gval;
    end
end
if (iter==10000)
    fprintf('did not converge\n')
end

阻尼高斯牛顿

function [x,fun_val]=damped_Gauss_Newtow(g,grad,J,F,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% g ......... objective function
% grad ...... gradient of the objective function
% J ......... Jacobian matrix
% F ......... vector-valued function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
%             at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
%             of min f(x)
% fun_val ... optimal function value
x=x0;
J_val=J(x);
F_val=F(x);
d=(J_val'*J_val)\(J_val'*F_val);
fun_val=g(x);
gval=grad(x);
iter=0;
while (norm(gval)>epsilon&&(iter<10000))
    iter=iter+1;
    t=s;
    while (fun_val-g(x-t*d)<alpha*t*gval'*d)
        t=beta*t;
    end
    x=x-t*d;
    J_val=J(x);
    F_val=F(x);
    d=(J_val'*J_val)\(J_val'*F_val);
    fun_val=g(x);
    gval=grad(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(gval),fun_val);
end
if (iter==10000)
    fprintf('did not converge\n')
end

用到的数据

clear;
s=1;
alpha=0.5;
beta=0.5;
epsilon=1e-5;

f1=@(x)-13+x(1)+((5-x(2))*x(2)-2)*x(2);
f1_1=@(x)1;
f1_1_1=@(x)0;
f1_2=@(x)-3*x(2)^2+10*x(2)-2;
f1_2_2=@(x)-6*x(2)+10;

f2=@(x)-29+x(1)+((x(2)+1)*x(2)-14)*x(2);
f2_1=@(x)1;
f2_1_1=@(x)0;
f2_2=@(x)3*x(2)^2+2*x(2)-14;
f2_2_2=@(x)6*x(2)+2;

f=@(x)f1(x)^2+f2(x)^2;
f_1=@(x)2*f1(x)+2*f2(x);
f_1_1=@(x)2*f1_1(x)+2*f2_1(x);
f_1_2=@(x)2*f1_2(x)+2*f2_2(x);
f_2=@(x)2*f1(x)*f1_2(x)+2*f2(x)*f2_2(x);
f_2_1=@(x)2*f1_2(x)+2*f2_2(x);
f_2_2=@(x)2*(f1_2(x)^2+f1(x)*f1_2_2(x)+f2_2(x)^2+f2(x)*f2_2_2(x));
gradient=@(x)[f_1(x);f_2(x)];
hessian=@(x)[f_1_1(x),f_1_2(x);f_2_1(x),f_2_2(x)];


F=@(x)[f1(x);f2(x)];
J=@(x)[f1_1(x),f1_2(x);f2_1(x),f2_2(x)];

global_min_x=[5;4];
local_min_x=[(53-4*sqrt(22))/3;(2-sqrt(22))/3];
saddle_x=[(53+4*sqrt(22))/3;(2+sqrt(22))/3];

x1=[-50;7];
x2=[20;7];
x3=[20;-18];
x4=[5;-10];

调用

data;
gradient_method_backtracking(f,gradient,x1,s,alpha,beta,epsilon);
gradient_method_backtracking(f,gradient,x2,s,alpha,beta,epsilon);
gradient_method_backtracking(f,gradient,x3,s,alpha,beta,epsilon);
gradient_method_backtracking(f,gradient,x4,s,alpha,beta,epsilon);

newton_hybrid(f,gradient,hessian,x1,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x2,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x3,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x4,alpha,beta,epsilon);

damped_Gauss_Newtow(f,gradient,J,F,x1,s,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x2,s,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x3,s,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x4,s,alpha,beta,epsilon);

1.使用gradient method with backtracking时
x 0 = ( − 50 , 7 ) T x_0=(-50,7)^T x0=(50,7)T时,经过2252次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
x 0 = ( 20 , 7 ) T x_0=(20,7)^T x0=(20,7)T时,经过2447次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
x 0 = ( 20 , − 18 ) T x_0=(20,-18)^T x0=(20,18)T时,经过2472次迭代后,收敛到全局最小值 ( 53 − 4 22 3 , 2 − 22 3 ) T (\frac{53- 4\sqrt{22}}{3},\frac{2- \sqrt{22}}{3})^T (353422 ,3222 )T
x 0 = ( 5 , − 10 ) T x_0=(5,-10)^T x0=(5,10)T时,经过2123次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
2.使用hybrid Newtow’s method时
x 0 = ( − 50 , 7 ) T x_0=(-50,7)^T x0=(50,7)T时,经过8次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
x 0 = ( 20 , 7 ) T x_0=(20,7)^T x0=(20,7)T时,经过8次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
x 0 = ( 20 , − 18 ) T x_0=(20,-18)^T x0=(20,18)T时,经过16次迭代后,收敛到全局最小值 ( 53 − 4 22 3 , 2 − 22 3 ) T (\frac{53- 4\sqrt{22}}{3},\frac{2- \sqrt{22}}{3})^T (353422 ,3222 )T
x 0 = ( 5 , − 10 ) T x_0=(5,-10)^T x0=(5,10)T时,经过13次迭代后,收敛到全局最小值 ( 53 − 4 22 3 , 2 − 22 3 ) T (\frac{53- 4\sqrt{22}}{3},\frac{2- \sqrt{22}}{3})^T (353422 ,3222 )T
3.使用damped Gauss-Newtow’s method时
x 0 = ( − 50 , 7 ) T x_0=(-50,7)^T x0=(50,7)T时,经过29次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
x 0 = ( 20 , 7 ) T x_0=(20,7)^T x0=(20,7)T时,经过29次迭代后,收敛到全局最小值 ( 5 , 4 ) T (5,4)^T (5,4)T
x 0 = ( 20 , − 18 ) T x_0=(20,-18)^T x0=(20,18)T时,不收敛
x 0 = ( 5 , − 10 ) T x_0=(5,-10)^T x0=(5,10)T时,不收敛
因为使用backtracking搜索时,最后t趋于0,导致没有走动,所以不收敛

5.3. Let f f f be a twice continuously differentiable function satisfying L I ⪰ ∇ 2 f ( x ) ⪰ m I L\mathbf{I}\succeq \nabla^2 f\left(\mathbf{x}\right)\succeq m\mathbf{I} LI2f(x)mI for some L > m > 0 L>m>0 L>m>0 and let x ∗ \mathbf{x}^* x be the unique minimizer of f f f over R n \mathbb{R}^n Rn.
(i)Show that
f ( x ) − f ( x ∗ ) ≥ m 2 ∥ x − x ∗ ∥ 2 f\left(\mathbf{x}\right)-f\left(\mathbf{x}^*\right)\ge \frac{m}{2}\|\mathbf{x}-\mathbf{x}^*\|^2 f(x)f(x)2mxx2
for any x ∈ R n \mathbf{x}\in\mathbb{R}^n xRn.
(ii)Let { x k } k ≥ 0 \left\{\mathbf{x}_k\right\}_{k\ge 0} {xk}k0 be the sequence generated by damped Newton’s method with constant stepsize t k = m L t_k=\frac{m}{L} tk=Lm.Show that
f ( x k ) − f ( x k + 1 ) ≥ m 2 L ∇ f ( x k ) T ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq \frac{m}{2 L} \nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right) f(xk)f(xk+1)2Lmf(xk)T(2f(xk))1f(xk)
(iii) Show that x k → x ∗ \mathbf{x}_k \to \mathbf{x}^* xkx as k → ∞ k\to \infty k.

解:
(i)
f ( x ) − f ( x ∗ ) = ∇ f ( x ∗ ) T ( x − x ∗ ) + 1 2 ( x − x ∗ ) T ∇ 2 f ( x ∗ ) ( x − x ∗ ) ≥ m 2 ∥ x − x ∗ ∥ 2 f\left(\mathbf{x}\right)-f\left(\mathbf{x}^*\right)=\nabla f\left(\mathbf{x}^*\right)^T\left(\mathbf{x}-\mathbf{x}^*\right)+\frac{1}{2}\left(\mathbf{x}-\mathbf{x}^*\right)^T\nabla^2f\left(\mathbf{x}^*\right)\left(\mathbf{x}-\mathbf{x}^*\right)\ge\frac{m}{2}\|\mathbf{x}-\mathbf{x}^*\|^2 f(x)f(x)=f(x)T(xx)+21(xx)T2f(x)(xx)2mxx2
(ii)
因为 L I ⪰ ∇ 2 f ( x ) L\mathbf{I}\succeq \nabla^2 f\left(\mathbf{x}\right) LI2f(x)
所以 ∥ ∇ 2 f ( x ) ∥ ≤ L \|\nabla^2 f\left(\mathbf{x}\right)\|\le L 2f(x)L
所以 f ∈ C L 1 , 1 f\in C_{L}^{1,1} fCL1,1
根据下降引理
f ( x k ) − f ( x k + 1 ) ≥ ∇ f ( x k ) T ( x k − x k + 1 ) − L 2 ∥ x k − x k + 1 ∥ 2 = m L ∇ f ( x k ) T ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) − L 2 m 2 L 2 ∇ f ( x k ) T ( ∇ 2 f ( x k ) ) − 1 ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) ≥ m L ∇ f ( x k ) T ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) − m 2 L ∇ f ( x k ) T ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) = m 2 L ∇ f ( x k ) T ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) \begin{aligned} &f\left(\mathbf{x}_k\right)-f\left(\mathbf{x}_{k+1}\right)\\ \ge&\nabla f\left(\mathbf{x}_k\right)^T\left(\mathbf{x}_k-\mathbf{x}_{k+1}\right)-\frac{L}{2}\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\|^2\\ =& \frac{m}{L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)-\frac{L}{2}\frac{m^2}{L^2}\nabla f\left(\mathbf{x}_{k}\right)^T\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1}\nabla f\left(\mathbf{x}_{k}\right)\\ \ge&\frac{m}{L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)-\frac{m}{2L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)\\ =&\frac{m}{2L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right) \end{aligned} ==f(xk)f(xk+1)f(xk)T(xkxk+1)2Lxkxk+12Lmf(xk)T(2f(xk))1f(xk)2LL2m2f(xk)T(2f(xk))1(2f(xk))1f(xk)Lmf(xk)T(2f(xk))1f(xk)2Lmf(xk)T(2f(xk))1f(xk)2Lmf(xk)T(2f(xk))1f(xk)
(iii)不会

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Nightmare004

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值