# 陆吾生讲座 最优化问题的数学基础

## Ⅰ. Taylor expansion

### One-variable case

f(x+δ)=f(x)+f(x)δ+12f′′(x)δ2+
<script id="MathJax-Element-60" type="math/tex; mode=display">f(x+\delta)=f(x)+f'(x)\delta+{1\over 2}f''(x)\delta^2+…</script>

### Multi-variable case

def. Hessian Matrix: 2f(x) <script id="MathJax-Element-61" type="math/tex">\nabla^2f(x)</script>

f(x+δ)=f(x)+Tf(x)δ+12δT2f(x)δ+
<script id="MathJax-Element-62" type="math/tex; mode=display">f(x+\delta)=f(x)+\nabla^Tf(x)\delta+{1\over 2}\delta^T\nabla^2f(x)\delta+…</script>

### Linear approximation of f(x) at x

δ0 <script id="MathJax-Element-63" type="math/tex">\delta\rightarrow 0</script>，f(x)是关于 δ <script id="MathJax-Element-64" type="math/tex">\delta</script>的线性函数

f(x+δ)f(x)+Tf(x)δ
<script id="MathJax-Element-65" type="math/tex; mode=display">f(x+\delta)\approx f(x)+\nabla^Tf(x)\delta</script>

### Quadratic approximation of f(x) at x

δ0 <script id="MathJax-Element-66" type="math/tex">\delta\rightarrow 0</script>，f(x)是关于 δ <script id="MathJax-Element-67" type="math/tex">\delta</script>的二次函数

f(x+δ)f(x)+Tf(x)δ+12δT2f(x)δ
<script id="MathJax-Element-68" type="math/tex; mode=display">f(x+\delta)\approx f(x)+\nabla^Tf(x)\delta+{1\over 2}\delta^T\nabla^2f(x)\delta</script>

## Ⅱ. Optimazation

minxRn×1f(x) <script id="MathJax-Element-69" type="math/tex">\min_{x\in \mathcal{R}^{n\times 1}}f(x)</script>

• 原始办法
得到两个方程，但对于复杂函数而言求导后解方程极其复杂。

f(x)=f(x)x1f(x)x2=0
<script id="MathJax-Element-70" type="math/tex; mode=display">\nabla f(x)=\begin{bmatrix} {\partial f(x)\over \partial x_1}\\ {\partial f(x)\over \partial x_2}\\ \end{bmatrix}=0</script>

• A.Cauchy Method
不通过解方程找到方程的解。
随便找一个 xk <script id="MathJax-Element-71" type="math/tex">x_k</script>，然后使其移动得到更小的函数值，interatively，直到 f(x)0 <script id="MathJax-Element-72" type="math/tex">\nabla f(x)\rightarrow 0</script>.
δ=f(xk) <script id="MathJax-Element-73" type="math/tex">\delta=-\nabla f(x_k)</script>， xk+1=xkf(xk) <script id="MathJax-Element-74" type="math/tex">x_{k+1}=x_k-\nabla f(x_k)</script>

f(xk+δ)f(xk)Tf(xk)δ=||f(xk)2||<0
<script id="MathJax-Element-75" type="math/tex; mode=display">f(x_k+\delta)-f(x_k)\approx\nabla^Tf(x_k)\delta=-||\nabla f(x_k)^2||<0 </script>故可以保证当前的 f(xk+δ) <script id="MathJax-Element-76" type="math/tex">f(x_k+\delta)</script>比原先的 f(xk) <script id="MathJax-Element-77" type="math/tex">f(x_k)</script>更小。
以此为基础还可以设置步长 α <script id="MathJax-Element-78" type="math/tex">\alpha</script>
F(α)=f(xkαTf(xk))
<script id="MathJax-Element-79" type="math/tex; mode=display">F(\alpha)=f(x_k-\alpha\nabla^Tf(x_k)) </script>得到的是关于 α <script id="MathJax-Element-80" type="math/tex">\alpha</script>的非线性函数，当取到 αopt <script id="MathJax-Element-81" type="math/tex">\alpha_{opt}</script>时可以得到最小的函数值。

关于 α <script id="MathJax-Element-82" type="math/tex">\alpha</script>的选取，可参见Line search.
以此迭代产生
xk+1=xkαkf(xk)
<script id="MathJax-Element-83" type="math/tex; mode=display">x_{k+1}=x_k-\alpha_k\nabla f(x_k) </script>
xk+2=xk+1αk+1f(xk+1)
<script id="MathJax-Element-84" type="math/tex; mode=display">x_{k+2}=x_{k+1}-\alpha_{k+1}\nabla f(x_{k+1}) </script>
...
<script id="MathJax-Element-85" type="math/tex; mode=display">...</script>直到 f(x)0 <script id="MathJax-Element-86" type="math/tex">\nabla f(x)\rightarrow 0</script>.

• Newton Method
上面的 α <script id="MathJax-Element-87" type="math/tex">\alpha</script>求起来麻烦也不好估计。由于 f <script id="MathJax-Element-88" type="math/tex">f</script>同时也是关于 δ <script id="MathJax-Element-89" type="math/tex">\delta</script>的二阶多项式
f(x+δ)f(x)+Tf(x)δ+12δT2f(x)δ
<script id="MathJax-Element-90" type="math/tex; mode=display">f(x+\delta)\approx f(x)+\nabla^Tf(x)\delta+{1\over 2}\delta^T\nabla^2f(x)\delta </script>故要求关于 δ <script id="MathJax-Element-91" type="math/tex">\delta</script>的最小值可关于 δ <script id="MathJax-Element-92" type="math/tex">\delta</script>求导
δ( f(xk)+Tf(xk)δ+12δT2f(xk)δ )=0
<script id="MathJax-Element-93" type="math/tex; mode=display">\nabla_{\delta}(~f(x_k)+\nabla^Tf(x_k)\delta+{1\over 2}\delta^T\nabla^2f(x_k)\delta~)=0 </script>且有性质
(cTx)=c
<script id="MathJax-Element-94" type="math/tex; mode=display">\nabla(c^Tx)=c</script>可得到
f(xk)+2f(xk)δ=0
<script id="MathJax-Element-95" type="math/tex; mode=display">\nabla f(x_k)+\nabla^2f(x_k)\delta=0</script>则
δ=(2f(xk))1f(xk)
<script id="MathJax-Element-96" type="math/tex; mode=display">\delta=-(\nabla^2f(x_k))^{-1}\nabla f(x_k)</script>可以确定步长
xk+1=xk(2f(xk))1f(xk)
<script id="MathJax-Element-97" type="math/tex; mode=display">x_{k+1}=x_k-(\nabla^2f(x_k))^{-1}\nabla f(x_k)</script>
如图真实函数为黑线。一开始任取 xk <script id="MathJax-Element-98" type="math/tex">x_k</script>，其Taylor展开后对函数为红线的近似。求得红线的极值点为 xk+1 <script id="MathJax-Element-99" type="math/tex">x_{k+1}</script>，又得到绿色的近似，得到绿线的极值点为 xk+2 <script id="MathJax-Element-100" type="math/tex">x_{k+2}</script>，反复迭代不断地逼近理想的极值点。

### 比较Cauchy和Newton的方法

MethodCauchyNewton
<script id="MathJax-Element-102" type="math/tex; mode=display"> \begin{array}{c|cc} Method & \text{Cauchy} & \text{Newton}\\ \hline 收敛速度 & 慢 & 快\\ 预处理（求导等）& 快 & 慢\\ 占据内存 & 小 & 大 \end{array} </script>

Quadratic approximation of f(x) at x

f(x+δ)f(x)+Tf(x)δ+12δT2f(x)δ
<script id="MathJax-Element-103" type="math/tex; mode=display">f(x+\delta)\approx f(x)+\nabla^Tf(x)\delta+{1\over 2}\delta^T\nabla^2f(x)\delta</script>中出现了Hessian Matrix H=2f(x) <script id="MathJax-Element-104" type="math/tex">H=\nabla^2f(x)</script>，最后这项是一个二次型。

def.xTHx>0xTHx0xTHx<0xTHx0xTHx>,<0positive definite P.Dpositive semidefinite P.S.Dnegative definite N.Dnegative semidefinite N.S.Dindefiniteiffiffiffiffλi>0λi0λi<0λi0
<script id="MathJax-Element-106" type="math/tex; mode=display">def.\begin{cases} x^THx>0 & \text{positive definite P.D} & \text{iff} &\lambda_i>0 \\ x^THx\geq0 & \text{positive semidefinite P.S.D} & \text{iff} & \lambda_i\geq0\\ x^THx<0 & \text{negative definite N.D} & \text{iff} &\lambda_i<0\\ x^THx\leq0 & \text{negative semidefinite N.S.D} & \text{iff} &\lambda_i\leq0\\ x^THx>,<0 & \text{indefinite}\\ \end{cases} </script>

H=[12.52.54]
<script id="MathJax-Element-107" type="math/tex; mode=display">H=\begin{bmatrix} 1&2.5\\ 2.5&4\\ \end{bmatrix} </script>
f(x)=xTHx=x21+5x1x2+4x22
<script id="MathJax-Element-108" type="math/tex; mode=display">f(x)=x^THx=x_1^2+5x_1x_2+4x_2^2</script>
det(λIH)=[λ12.52.5λ4]=(λ1)(λ4)6.25=0
<script id="MathJax-Element-109" type="math/tex; mode=display">\det(\lambda I-H)= \begin{bmatrix} \lambda-1&2.5\\ 2.5&\lambda-4\\ \end{bmatrix} =(\lambda-1)(\lambda-4)-6.25=0</script>

1×46.25<0
<script id="MathJax-Element-110" type="math/tex; mode=display">1\times4-6.25<0</script>妈妈我不懂……

### Convex Function

h=tanθ(x1x)=f(x)(x1x)
<script id="MathJax-Element-113" type="math/tex; mode=display">h=\tan\theta(x_1-x)=f'(x)(x_1-x) </script>则
f(x1)=f(x)+h+p
<script id="MathJax-Element-114" type="math/tex; mode=display">f(x_1)=f(x)+h+p </script>又由Taylor展开项
f(x+δ)f(x)+Tf(x)δ+12δT2f(x)δ
<script id="MathJax-Element-115" type="math/tex; mode=display"> f(x+\delta)\approx f(x)+\nabla^Tf(x)\delta+{1\over 2}\delta^T\nabla^2f(x)\delta </script>代入到凸函数中得到
f(x1)f(x)+Tf(x)(x1x)+12(x1x)T2f(x)(x1x)
<script id="MathJax-Element-116" type="math/tex; mode=display">f(x_1)\approx f(x)+\nabla^Tf(x)(x_1-x)+{1\over 2}(x_1-x)^T\nabla^2f(x)(x_1-x) </script>则
f(x1)f(x)Tf(x)(x1x)12(x1x)T2f(x)(x1x)0
<script id="MathJax-Element-117" type="math/tex; mode=display">f(x_1)-f(x)-\nabla^Tf(x)(x_1-x)\approx {1\over 2}(x_1-x)^T\nabla^2f(x)(x_1-x)\geq0 </script>即二次型是半正定的。

2f(θ)=1Ni=1N(12li)2e(12li)θTx^ix^ix^Ti(1+e(12li)θTx^i)2
<script id="MathJax-Element-118" type="math/tex; mode=display"> \nabla^2 f(\theta)={1\over N}\sum_{i=1}^N{(1-2l_i)^2e^{(1-2l_i)\theta^T\hat x_i}\hat x_i\hat x_i^T \over (1+e^{(1-2l_i)\theta^T\hat x_i})^2} </script>也可判断出原函数为凸。

ECNU的秋

09-04 754
07-03 6097

02-10 7万+
06-19 1万+
05-15 1270
01-02 9万+
04-20 112
01-24 2000
03-25 422