【Optimal Control (CMU 16-745)】Lecture 3 Optimization Part 1

最新推荐文章于 2024-10-15 10:20:47 发布

啵啵啵啵哲

最新推荐文章于 2024-10-15 10:20:47 发布

阅读量118

点赞数 1

分类专栏：最优控制文章标签：学习机器人

本文链接：https://blog.csdn.net/xuzhengzhe/article/details/132598966

版权

最优控制专栏收录该内容

10 篇文章 2 订阅

订阅专栏

本文概述了离散时间系统的稳定性分析，介绍了根查找和最小化问题，包括固定点迭代和牛顿法的原理。重点讲解了牛顿法的线性近似、解线性方程组以及正定性在求解过程中的作用。

摘要由CSDN通过智能技术生成

Review:

Discrete-time dynamics/simulations
Stability of discrete-time systems
Forward/backward Euler methods
RK4 method
Zero/first order hold on controls

Lecture 3 Optimization Pt.1

Overview

Notation
Root finding
Minimization

1. Notation

(1) Scalar function $\mathbb{R}^n \rightarrow \mathbb{R}$

$\frac{\partial f}{\partial x} = \left[\frac{\partial f}{\partial x_1} \cdots \frac{\partial f}{\partial x_n}\right] \in \mathbb{R}^{n\times 1} \in \mathbb{R}^{1\times n}$ is a row vector.

This is because $\frac{\partial f}{\partial x}$ is the linear operator mapping $\Delta x$ into $\Delta f$ :
$f\left(x + \Delta x\right) \approx f(x) + \frac{\partial f}{\partial x} \Delta x$

(2) Vector function $\mathbb{R}^m \rightarrow \mathbb{R}^n$

Similarly, we have
$\frac{\partial g}{\partial y}\in \mathbb{R}^{n\times m},$ because
$g\left(y + \Delta y\right) \approx g(y) + \frac{\partial g}{\partial y} \Delta y.$

(3) Chain rule

The above conventions make the chain rule work:
$f\left(g(y+\Delta y)\right) \approx f\left(g(y)\right) + \left.\frac{\partial f}{\partial x} \right|_{g(y)} \cdot \left.\frac{\partial g}{\partial y} \right|_{y} \cdot \Delta y$

For convenience, we will define the Jacobian matrix as
$\nabla f \left(x\right) = \left(\frac{\partial f}{\partial x}\right)^T \in \mathbb{R}^{n\times 1}$ (column vector) and Hessian matrix as
$\nabla ^2 f \left(x\right) = \frac{\partial}{\partial x} \left(\nabla f \left(x\right)\right) = \frac{\partial^2 f}{\partial x^2} \in \mathbb{R}^{n\times n}$ (symmetric matrix).

2. Root finding

(1)

Given $f\left(\mathrm{x}\right)$ , find $\mathrm{x}^*$ such that $f\left(\mathrm{x}^*\right) = 0$ .

Example:

equilibrium point of a continuous-time dynamics

(2)

Given $f\left(\mathrm{x}\right)$ , find $\mathrm{x}^*$ such that $f\left(\mathrm{x}^*\right) = \mathrm{x}^*$ .

Example:

equilibrium point of a discrete-time dynamics

(3) Method 1: Fixed-point iteration

Simplist solution method
If fixed point is stable, just iterate the dynamics until convergence
Only works for systems with single equilibrium
Only works for stable fixed points and has slow convergence

(4) Method 2: Newton’s method

I. Fit a linear approximation to $f\left(\mathrm{x}\right)$
$f\left(\mathrm{x}+\Delta \mathrm{x}\right) \approx f\left(\mathrm{x}\right) + \left.\frac{\partial f}{\partial x}\right|_{\mathrm{x}} \Delta \mathrm{x}$

II. Set approximation to zero and solve for $\Delta \mathrm{x}$
$f\left(\mathrm{x}\right) + \left.\frac{\partial f}{\partial x}\right|_{\mathrm{x}} \Delta \mathrm{x} = 0 \Rightarrow \Delta \mathrm{x} = - \left.\frac{\partial f}{\partial x}\right|_{\mathrm{x}}^{-1} f\left(\mathrm{x}\right)$

III. Apply correction:
$\mathrm{x} \leftarrow \mathrm{x} + \Delta \mathrm{x}$

IV. Repeat until convergence

Example: Backward Euler

在这里插入图片描述
Both two methods can get a damping motion, but Newton’s method is much faster.
Plot the error during the iteration:

Very fast convergence with Newton’s method.
The convergence rate of fixed-point iteration is linear, while Newton’s method is quadratic.

3-element Vector{Float64}:
0.09793658173053843
3.7830087232931797e-6
5.2874553670659e-15

(5) Takeaway message

Quadaratic convergence (Newton’s method)
Can get machine precision
Most expensive part: solving linear system (caculation of Jacobian is not the most expensive part, but the factorization and inversion of Jacobian is. $\mathcal{O}\left(n^3\right)$ for $n\times n$ matrix)
Can improve complexity by taking advantage of problem structure (e.g. sparse Jacobian in many cases) (more later)

3. Minimization

$\min_{\mathrm{x}} f\left(\mathrm{x}\right), f\left(\mathrm{x}\right): \mathbb{R}^n \rightarrow \mathbb{R}$

If $f$ is smooth, then $\left.\frac{\partial f}{\partial x}\right|_{\mathrm{x}^*} = 0$ at a local minimum.
Thus, we can transform the minimization problem into a root-finding problem: $\nabla f\left(\mathrm{x}\right) = 0$ .

$\Rightarrow$ Apply Newton’s method:
$\begin{align*} \nabla f\left(\mathrm{x}+\Delta x\right) &\approx \nabla f\left(\mathrm{x}\right) + \frac{\partial}{\partial x} \left(\nabla f\left(\mathrm{x}\right)\right) \Delta x \\ & = \nabla f\left(\mathrm{x}\right) + \nabla ^2 f\left(\mathrm{x}\right) \Delta x \\ & = 0 \end{align*}$

$\Rightarrow \Delta x = - \left(\nabla ^2 f\left(\mathrm{x}\right)\right)^{-1} \nabla f\left(\mathrm{x}\right)$

Update $\mathrm{x} \leftarrow \mathrm{x} + \Delta \mathrm{x}$ , then repeat until convergence.

(1) Intuition

Fit a quadratic approximation to $f\left(\mathrm{x}\right)$
Exactly minimize quadratic approximation

(2) Example:

$\min_{x} f\left(x\right) = x^4 + x^3 - x^2 -x$

start at: $x = 1.0$ $\Rightarrow$ converge to the global minimum

start at: $x = - 1.5$ $\Rightarrow$ converge to a local minimum

start at: $x = 0.0$ $\Rightarrow$ converge to a local maximum
![在这里插入图片描述](https://img-blog.csdnimg.cn/a03161e649ac4383ae90a07f12001444.png#pic_center

(3) Takeaway message

Newton’s method is a local root-finding method. It will converge to the closest fixed point to the initial guess (minimum, maximum, or saddle point).

4. Sufficient conditions

first-order necessary condition: $\nabla f\left(\mathrm{x}\right) = 0$ (not sufficient for a minimum)
Scalar case:
$\Delta x = - \left(\nabla ^2 f\left(\mathrm{x}\right)\right)^{-1} \nabla f\left(\mathrm{x}\right)$ . “-” means descent, $\nabla f$ is the gradient.
The Hessian $\nabla ^2 f\left(\mathrm{x}\right)$ can be interpreted as the learning rate or step size, so it should be positive definite.
i. $\nabla ^2 f\left(\mathrm{x}\right) > 0$ $\Rightarrow$ descent (minimization)
ii. $\nabla ^2 f\left(\mathrm{x}\right) < 0$ $\Rightarrow$ ascent (maximization)
Vector case ( $\mathbb{R}^n$ ):
$\nabla ^2 f\left(\mathrm{x}\right) \succ 0$ (or $\nabla ^2 f\left(\mathrm{x}\right) \in \mathbb{S}^n_{++}$ )
$\Rightarrow$ descent (minimization)
If $\nabla ^2 f\left(\mathrm{x}\right) \succ 0$ , $\forall \mathrm{x}$ , then $f\left(\mathrm{x}\right)$ is strongly convex, then Newton’s method will converge globally and give global minimum.
But this always not true for hard/nonlinear problems.

5. Regularization

aims to ensure the update direction is always descent direction

It is a practical solution to make sure we are always minimizing: check whether the Hessain is positive definite.

$\leftarrow \nabla ^2 f\left(\mathrm{x}\right)$
while $\nsucc 0$ :
$\quad$ $\leftarrow H + \beta I (\beta > 0)$
$\quad$ ( $\beta$ is a hyperparameter)
end
$\Delta x = - H^{-1} \nabla f\left(\mathrm{x}\right)$
$\mathrm{x} \leftarrow \mathrm{x} + \Delta \mathrm{x}$

# Initialization
H = calculate_hessian(x)
beta = hyperparameter

# Optimization Loop
while not is_positive_definite(H):
    H = H + beta * I  # I is the identity matrix, beta is a hyperparameter

# Update
dx = - H.inv() * grad_f(x)
x = x + dx

# Termination

It is also called damped Newton’s method. It guarantees that the update direction is always descent direction, and makes the step size smaller (adding $\beta I$ is equivalent to adding a quadratic penalty term to $\Delta x$ ).

start at: $x = 0.0$
在这里插入图片描述