高级优化理论与方法（一）

liuzibujian

已于 2024-06-26 10:21:03 修改

阅读量1.1k

点赞数 25

分类专栏：高级优化理论与方法文章标签：数学优化理论优化方法算法

于 2024-03-04 15:57:36 首次发布

本文链接：https://blog.csdn.net/liuzibujian/article/details/136448422

版权

高级优化理论与方法专栏收录该内容

16 篇文章

订阅专栏

该博客是《高级优化理论与方法》课程笔记，介绍了优化理论基本概念，如优化定义、邻域、局部最优等，还阐述了一阶导、二阶导和方向导数。同时讲解了无约束优化的FONC和SONC定理及证明，给出找最值点的必要条件，下节课将介绍充分条件和找最值点方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

这是一个新的系列。我这个学期选了一门《高级优化理论与方法》的课，想着反正要做笔记，不如直接做电子笔记，于是就有了这个系列。由于这门课是一周上一次，所以我基本上会保持一周一更的速度。内容会从易到难。

我们老师的讲稿都是英文的，板书也是英文的。简洁起见，我这里就保留英文板书的原汁原味。但是由于本人的英文水平有限，所以还是偶尔会在里面穿插一些中文。看到中文，大概率是我自己加的注释。

由于是课堂笔记，所以里面的内容可能会有一些小错误，或者不那么严谨的地方，还请大家多多包容，批评指正。

基本概念

优化的概念

定义

Def: Given a function $f:A\rightarrow \mathbb{R}^n$ , where $\subseteq \mathbb{R}^n$
sought $\in A,s.t.$
$\begin{cases} f(x_0)\leq f(x), \forall x \in A(minimizer)\\ f(x_0)\geq f(x), \forall x \in A(maximizer) \end{cases}$

注：在本系列中，如未明确说明，则默认是求最小值。

$f$ : objective function（目标函数）
$A$ :constraint（限制条件）

优化的写法

对于优化，可简写为
min/max f(x)
subject to $\in A$

邻域

Neighborhood of $x^* \in \mathbb{R}^n$ for $\epsilon>0$ : $N_{\epsilon}(x^*)=\{x\in \mathbb{R}^n:||x-x^*||\leq\epsilon\}$

局部最优

$x^*$ is a local optimizer, if $\exist \epsilon>0,\forall x\in N_{\epsilon}(x^*),f(x)\geq f(x^*)$

Terms(术语）：

$A$ : feasible set（可行解集）
$\in A$ : feasible solution/vector（可行解）
$x^*$ : local/global optimal (feasible) solution（局部/全局最优解）
$f(x^*)=min_{x\in A}f(x),x^*=argmin_{x \in A}f(x)$

可行方向

定义

Def: feasible direction（可行方向）
A vector $d\in \mathbb{R}^n(d\neq0)$ ,is a feasible direction at $x\in A\subseteq \mathbb{R}^n$ ,if $\exist\alpha_0(\alpha_0\in \mathbb{R})$ s.t. $\forall \alpha\in[0,\alpha_0],x+\alpha d\in A$

注：对于 $d\in \mathbb{R}^n$ ，默认 $∣∣ d ∣∣ = 1$ 。

内部点

interior point: 任意方向都是可行方向的点。

极限点

extreme point: 有些方向是可行方向，有些方向不可行的点。

边界

boundary:极限点的集合。

导数

一阶导

Def: First-Order Derivative
$f:\mathbb{R}^n\rightarrow\mathbb{R},x=[x_1,x_2,\cdots,x_n]^T\in \mathbb{R}^n$
$Df\stackrel{\Delta}{=}[\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n}]$
Gradient（梯度）： $\nabla f=(Df)^T$
注：默认向量是列向量， $x$ 是列向量，梯度也是列向量。

二阶导

Second-Order Derivative: Hessian Matrix
$F(x)=\begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_1 \partial x_n} & \cdots &\frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$
注：黑塞矩阵也可以用 $H (x)$ 来表示。

例子

$f(x_1,x_2)=5x_1+8x_2+x_1x_2-x_1^2-2x_2^2$

$Df(x)=(\nabla f(x))^T=[5+x_2-2x_1,8+x_1-4x_2]$
$H(x)=\begin{bmatrix} -2 & 1 \\ 1 & -4 \end{bmatrix}$

方向导数

$x=x_0+\alpha d$

$\frac{\partial f}{\partial d}(x)=\left . \frac{d}{d\alpha}f(x_0+\alpha d) \right |_{\alpha=0}=d^T \cdot \nabla f(x_0)=<\nabla f(x_0),d>$

$<\cdot>$ 表示向量的内积。

注：该公式可以这么理解，在 $d$ 这个方向上， $f(x_0+\alpha d)$ 就是一个关于 $\alpha$ 的函数。

例子

$f(x)=x_1x_2x_3,d=[\frac{1}{2},\frac{1}{2},\frac{1}{\sqrt{2}}]^T$
$\frac{\partial f}{\partial d}(x)=\nabla f(x)^T\cdot d=[x_2x_3,x_1x_3,x_1,x_2]\cdot \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2}\\ \frac{1}{\sqrt{2}} \end{bmatrix}=\frac{x_2x_3+x_1x_3+\sqrt{2}x_1x_2}{2}$

Unconstrained Optimization（无约束优化）

FONC

First-Order Necessary Condition (FONC):
Theorem: Let $\Omega$ be a subset of $\mathbb{R}^n$ and $\in C^1$ :a real valued function on $\Omega$ . If $x^*$ is a local minimizer of f over $\Omega$ , then for any feasible direction $d$ a $x^*$ , we have $d^T\cdot \nabla f(x^*)\geq0$ .

该定理意为连续函数的最小点处的任意方向导数恒为非负。

注： $f\in C^1$ 表示 $f$ 是连续函数，即 $f$ 在各点处的导数均存在；
$f\in C^2$ 表示 $f$ 的导数连续，即 $f$ 在各点处的二阶导数均存在。

证明

Pick an arbitrary feasible direction $d$ at $x^*$ .

Define $x(\alpha)=x^*+\alpha d\Rightarrow \alpha>0,x(0)=x^*$

$\phi(x)=f(x(\alpha))$

Taylor’s Theorem: $f(x^*+\alpha d)-f(x^*)=\phi(\alpha)-\phi(0)=\phi'(0)\alpha+o(\alpha)$

$\because x*$ local minimizer

$\therefore \phi'(0)\alpha+o(\alpha)\geq0$

$\therefore \phi'(0)\geq \frac{o(\alpha)}{\alpha}\rightarrow0$

$\therefore \phi'(0)\geq0$

$\therefore \phi'(0)=d^T\cdot \nabla f(x^*)\geq0$

推论

Corollary (interior point): If $x^*$ is an interior point and closed minimizer, then $\nabla f(x^*)=0$

推论的证明

$\forall d\in \mathbb{R}^n, \begin{cases} d^T\nabla f(x^*)\geq 0\\ -d^T\nabla f(x^*)\geq 0 \end{cases}\Rightarrow \nabla f(x^*)=0$

例子

min $x_1^2+0.5x_2^2+3x_2+4.5$
s.t. $x_1,x_2\geq 0$

$\Omega=\{x|x_1\geq0,x_2\geq0\}$

$\nabla f(x)=[2x_1,x_2+3]^T$

① $x^*=[1,3]^T,\nabla f(x^*)=[2,6]^T$
$d=[d_1,d_2]^T$
$d^T\nabla f(x^*)=2d_1+6d_2$
取 $d_1=0,d_2=-1$ ，上式小于0，故 $1,3]^T$ 不为最小值点。

② $x^*=[0,3]^T,\nabla f(x^*)=[0,6]^T$
此时 $d_1\geq0$ ， $d_2$ 可正可负。
取 $d_1=0,d_2=-1$ ，上式小于0，故 $0,3]^T$ 不为最小值点。

③ $x^*=[1,0]^T,\nabla f(x^*)=[2,3]^T$
此时 $d_2\geq0$ ， $d_1$ 可正可负。
取 $d_1=-1,d_2=0$ ，上式小于0，故 $1,0]^T$ 不为最小值点。

④ $x^*=[0,0]^T,\nabla f(x^*)=[0,3]^T$
此时 $d_1\geq0$ ， $d_2\geq0$ 。
$d^T\nabla f(x^*)=3d_2\geq 0$
$0,0]^T$ 满足FONC，可能为最小值点。经检验，该点确实为最小值点。

SONC

Second-Order Necessary Condition:
Theorem: Let $\Omega$ be a subset of $\mathbb{R}^n$ and $f\in C^2$ : a real-valued function on $\Omega$ , $x^*$ a local minimizer of $f$ over $\Omega$ , and $d$ a feasible direction at $x^*$ . If $d^T\nabla f(x^*)=0$ , then $d^T F(x^*)d\geq0$ , where F(x) is the Hessian of $f$ .

证明

Suppose $d^TF(x)d<0$

Define $x(\alpha)=x^*+\alpha d,\phi(x)=f(x(\alpha))$

Taylor’s Theorem: $\phi(x)=\phi(0)+\phi'(0)\alpha+\frac{\alpha^2}{2}\phi'(0)+o(\alpha^2)$

$\because d^T\nabla f(x^*)=0$

$\therefore \phi'(0)=0$

$\therefore \phi(\alpha)-\phi(0)=\frac{\alpha^2}{2}\phi'(0)+o(\alpha^2)\geq0$

$\therefore \phi'(0)\geq \frac{2o(\alpha^2)}{\alpha^2}\rightarrow 0$

$\therefore \phi'(0)\geq0$

$d^TF(x^*)d=\phi'(0)\geq0$

推论

Corollary (interior point):
If $x^*$ is an interior point and local minimizer, then $\forall d\in \mathbb{R}: d^TF(x^*)d\geq0$ .
注：此时， $F(x^*)$ 是半正定阵，即 $F(x^*)\geq0$

总结

本节课介绍了优化理论的最基础的一些概念，以及导数的概念。在介绍完基本概念之后，介绍了几个无约束优化中最基础的定理。目前给出了两个找最值点的必要条件，分别是FONC和SONC，可用于否定一些点是最值点的可能性，但是不太适合用来找最值点。下节课将介绍找最值点的一个充分条件SOSC，并给出一些找最值点的方法。