Directory
9.1 Unconstrained minimization problems
In this chapter, we discuss methods for solving the unconstrained optimization problem
P
9.1
:
f
(
x
)
P9.1: ~f(x)
P9.1: f(x) where
f
:
R
n
→
R
f : R_n → R
f:Rn→R is convex and twice continuously differentiable (which implies that domf is open). We denote the optimal value,
inf
x
f
(
x
)
=
f
(
x
⋆
)
\inf_x f(x) = f(x^⋆)
infxf(x)=f(x⋆), as
p
⋆
p^⋆
p⋆.
Since
f
f
f is differentiable and convex, a necessary and sufficient condition for a point
x
⋆
x^⋆
x⋆ to be optimal is
P
9.2
:
∇
f
(
x
∗
)
=
0.
P9.2: ~ \nabla f (x^*) = 0.
P9.2: ∇f(x∗)=0. Thus, solving the unconstrained minimization problem
(
9.1
)
(9.1)
(9.1) is the same as finding a solution of
(
9.2
)
(9.2)
(9.2), which is a set of n equations in the n variables
x
1
,
.
.
.
,
x
n
x_1 ,...,x_n
x1,...,xn. In a few special cases, we can find a solution to the problem
(
9.1
)
(9.1)
(9.1) by
analytically solving the optimality equation
(
9.2
)
(9.2)
(9.2), but usually the problem must be solved by an iterative algorithm. By this, we mean an algorithm that computes a sequence of points
x
(
0
)
,
x
(
1
)
,
.
.
.
∈
d
o
m
f
x^{(0)}, x^{(1)},...\in \mathbf{dom} f
x(0),x(1),...∈domf with
f
(
x
(
k
)
)
→
p
⋆
f(x (k) ) → p^⋆
f(x(k))→p⋆ as
k
→
∞
k → ∞
k→∞.
9.1.2 Strong Convexity and implications
Lower bound on ∇ 2 f ( x ) \nabla^2 f(x) ∇2f(x) (Hessian Matrix)
We assume that the obejctive function is strongly convex on S \mathcal{S} S, which means there exists an m > 0 m>0 m>0 such that P 9.7 : ∇ 2 f ( x ) ⪰ m I P9.7:~ \nabla^2 f(x) \succeq m \mathbf{I} P9.7: ∇2f(x)⪰mI for all x ∈ S x\in\mathcal{S} x∈S.
Strong convexity has some interesting consequences. For x , y ∈ S x,y \in \mathcal{S} x,y∈S, we have f ( y ) = f ( x ) + ∇ f ( x ) T ( y − x ) + 1 2 ( y − x ) T ∇ 2 f ( z ) ( y − z ) , f(y) = f(x) + \nabla f(x)^T (y-x) + \frac{1}{2}(y-x)^T\nabla^2 f(z) (y-z), f(y)=f(x)+∇f(x)T(y−x)+21(y−x)T∇2f(z)(y−z), for some z z z on the line segment [ x , y ] [x,y] [x,y].
By the strong convexity assumption ( 9.7 ) (9.7) (9.7), the last term on the righthand side is at least m 2 ∥ y − x ∥ 2 2 \frac{m}{2}\|y-x\|_2^2 2m∥y−x∥22, so we have the inequality P 9.8 : f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 ∥ y − x ∥ 2 2 P9.8:~f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{m}{2}\|y-x\|_2^2 P9.8: f(y)≥f(x)+∇f(x)T(y−x)+2m∥y−x∥22 for all x x x and y y y in S \mathcal{S} S. When m = 0 m=0 m=0, we recover the basic inequality characterizing convexity; for m > 0 m>0 m>0, we obtain a better lower bound on f ( y ) f(y) f(y) than follows from convexity alone.
Then, we will show the inequality (9.8) can be used to bound
f
(
x
)
−
p
∗
f(x)-p^*
f(x)−p∗, which is the suboptimality of the point
x
x
x, in terms of
∥
∇
f
(
x
)
∥
2
\|\nabla f(x) \|_2
∥∇f(x)∥2. the righthand side of (9.8) is a convex quadratic function of
y
y
y (for fixed x). Setting the gradient with respect to
y
y
y equal to zeros, we find that
y
~
=
x
−
1
m
∇
f
(
x
)
\tilde{y} = x - \frac{1}{m}\nabla f(x)
y~=x−m1∇f(x) minimizes the righthand side. Therefore we have
f
(
y
)
≥
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
+
1
m
∥
y
−
x
∥
2
2
≥
f
(
x
)
+
∇
f
(
x
)
T
(
y
~
−
x
)
+
1
m
∥
y
~
−
x
∥
2
2
=
f
(
x
)
−
1
2
m
∥
∇
f
(
x
)
∥
2
2
\begin{aligned} f(y) & \geq f(x) + \nabla f(x)^T(y-x) + \frac{1}{m} \|y-x\|_2^2 \\ & \geq f(x) + \nabla f(x)^T(\tilde{y}-x) + \frac{1}{m} \|\tilde{y}-x\|_2^2 \\ &=f(x) -\frac{1}{2m}\|\nabla f(x) \|_2^2 \end{aligned}
f(y)≥f(x)+∇f(x)T(y−x)+m1∥y−x∥22≥f(x)+∇f(x)T(y~−x)+m1∥y~−x∥22=f(x)−2m1∥∇f(x)∥22 Since this holds fpr any
y
∈
S
y \in S
y∈S, we have
P
9.9
:
p
∗
≥
f
(
x
)
−
1
2
m
∥
∇
f
(
x
)
∥
2
2
,
P9.9: ~p^* \geq f(x) - \frac{1}{2m} \|\nabla f(x) \|_2^2,
P9.9: p∗≥f(x)−2m1∥∇f(x)∥22, which can be rewritten as
∥
f
(
x
)
−
p
∗
∥
2
≤
1
2
m
∥
∇
f
(
x
)
∥
2
2
.
\| f(x) - p^*\|_2 \leq \frac{1}{2m} \|\nabla f(x) \|_2^2.
∥f(x)−p∗∥2≤2m1∥∇f(x)∥22.
We can also derive a bound on
∥
x
−
x
∗
∥
2
,
\|x-x^*\|_2,
∥x−x∗∥2, the distance between
x
x
x and any optimal point
x
∗
x^*
x∗, in terms of
∥
∇
f
(
x
)
∥
2
:
\| \nabla f(x) \|_2:
∥∇f(x)∥2:
P
9.11
:
∥
x
−
x
∗
∥
≤
2
m
∥
∇
f
(
x
)
∥
2
.
P9.11:~\| x - x^*\| \leq \frac{2}{m}\| \nabla f(x) \|_2.
P9.11: ∥x−x∗∥≤m2∥∇f(x)∥2.
To see this, we apply (9.8) with
y
=
x
∗
y = x^*
y=x∗ to obtain
p
∗
=
f
(
x
∗
)
≥
f
(
x
)
+
∇
f
(
x
)
T
(
x
∗
−
x
)
+
m
2
∥
x
∗
−
x
∥
2
2
≥
f
(
x
)
+
∥
∇
f
(
x
)
∥
2
∥
x
∗
−
x
∥
2
+
m
2
∥
x
∗
−
x
∥
2
2
,
\begin{aligned} p^* = f(x^*) & \geq f(x) + \nabla f(x)^T (x^* - x) + \frac{m}{2} \| x^* - x \|_2^2 \\ & \geq f(x) + \| \nabla f(x) \|_2 \| x^* - x\|_2 + \frac{m}{2} \| x^* - x \|_2^2, \end{aligned}
p∗=f(x∗)≥f(x)+∇f(x)T(x∗−x)+2m∥x∗−x∥22≥f(x)+∥∇f(x)∥2∥x∗−x∥2+2m∥x∗−x∥22, where we use the Cauchy-Schwarz inequality in the second inequality,
<
x
∗
,
x
>
+
∥
x
∗
−
x
∥
≥
0
<x^*,x>+\|x^*-x\| \geq 0
<x∗,x>+∥x∗−x∥≥0. Since
p
∗
≤
f
(
x
)
p^* \leq f(x)
p∗≤f(x), we must have
−
∥
∇
f
(
x
)
∥
2
∥
x
∗
−
x
∥
2
+
m
2
∥
x
∗
−
x
∥
2
2
≤
0
,
-\|\nabla f(x) \|_2 \| x^* -x \|_2 + \frac{m}{2}\| x^* - x \|_2^2 \leq 0,
−∥∇f(x)∥2∥x∗−x∥2+2m∥x∗−x∥22≤0, from which
(
9.11
)
(9.11)
(9.11) follows.
Upper bound on ∇ 2 f ( x ) \nabla^2 f(x) ∇2f(x) (Hessian Matrix)
The inequality (9.8) implies that the sublevel sets contained in
S
S
S are bounded, so in particular,
S
S
S is bounded. Therefore the maximum eigenvalue of
∇
2
f
(
x
)
\nabla^2 f(x)
∇2f(x), which is a continuous function of
x
x
x on
S
S
S, is bounded above on
S
S
S, i.e., there exists a constant
M
M
M such that
∇
2
f
(
x
)
⪯
M
I
\nabla^2 f(x) \preceq M \mathbf{I}
∇2f(x)⪯MI for all
x
∈
S
x \in \mathcal{S}
x∈S. This upper bound on the Hessian implies for any
x
,
y
∈
S
,
x,y \in \mathcal{S},
x,y∈S,
f
(
y
)
≤
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
+
M
2
∥
y
−
x
∥
2
2
,
f(y) \leq f(x) + \nabla f(x)^T (y-x) + \frac{M}{2}\|y-x\|_2^2,
f(y)≤f(x)+∇f(x)T(y−x)+2M∥y−x∥22, which is analogous to
(
9.8
)
(9.8)
(9.8). Minimizing each side over
y
y
y yieldes
p
∗
≤
f
(
x
)
−
1
2
M
∥
∇
f
(
x
)
∥
2
2
p^* \leq f(x) - \frac{1}{2M} \| \nabla f(x) \|_2^2
p∗≤f(x)−2M1∥∇f(x)∥22 which can be rewritten as
∥
f
(
x
)
−
p
∗
∥
≥
1
2
M
∥
∇
f
(
x
)
∥
2
2
,
\| f(x)-p^*\| \geq \frac{1}{2M}\| \nabla f(x) \|_2^2,
∥f(x)−p∗∥≥2M1∥∇f(x)∥22, the counterpart to
(
9.9
)
.
(9.9).
(9.9).