细节内容请关注微信公众号:运筹优化与数据科学
ID: pomelo_tree_opt
这部分内容是关于optimization的review.
-------------------------------------
首先是generalized optimization model
f(x)的前半截是个熵函数,后半截是个norm函数。
g(x)是一组线性函数
h(x)是一组二次函数
C是说我们关注的其实只是{-1 ,1}
X是说第一象限
-----------------------------------
先理解目标函数
函数的表达有代数式form,就是那个f(x), 几何式graph就是gra(f) , 以及contour等高线(投影到二维平上面)。对于函数而言,我们会观察这个函数包含多少信息。
二阶信息是个symmetric matrix, 而对称矩阵有很多特殊的性质,在linear algebra的章节有介绍。
再理解约束条件,一堆函数可以组成一个系统,由一堆等式或不等式构成的一个系统。
目标函数与约束条件组合起来就是optimization的问题了。
==========================
Principle of ARR for optimization
-
Approximation– “accept good-enough” to make the problem easy.
-
Reformulation– “change view” to make the problem clear and simple.
-
Relaxation– “drop difficulties” to get a bound of the problem.
-------------------------------
1. Optimality by approximation
(1) Differentiable function
-
Taylor's theorem
(2) Non-differentiable function
-
acvitation (step) function by sigmoid function (Neural networks)
-
max function by square-root function
(3) Continous function
-
linear, piecewise linear, quadratic, ... (Support vector machine)
(4) Norm function
-
0-norm by 1-norm, 2-norm (Machine learning)
-
0-norm by p-sub-norm (Sparse solutions)
-------------------------------
(1) 对于光滑可微的函数而言,有泰勒展开式,可以用一阶信息、二阶信息来近似。至少可以用一阶信息来近似,就是用一个线性函数来近似。也可以用一阶+二阶来近似,当然后者的近似效果会好,但是计算量肯定相应变大。
(2) 对于不可微的函数,例如NN中sign function这种的,一般首先是用一些光滑可微的函数来近似,像那个sigmoid function或hyperbolic tangent function.
(3) 对于连续函数,比如可能是高次的,可以用线性、分段线性、二次函数来近似。例如用支持向量机来做回归,得到回归曲线,有linear的,kernel的,quadratic的。
(4) 对于norm函数,0-norm, 1-norm, 2-norm, etc. 其实都是在度量距离、差异等,在一定条件下可以换用。
================================
2. Optimality by reformulation
(1) High dimensionality formulation for easy manipulation
-
Kernel function method (SVM)
-
Augmented Lagrange method
-
Multi-integer problem by 0-1 binary reformulation
(2) Discrete formulation by continuous reformulation
-
0-1 binary constraint by quadratic constraint
(3) Nonconvexity by “difference of convex functions”
-
DC programming
(4) Matrix requirement by linear matrix inequality (LMI)
-
Semidefinite programming (SDP)
(5) Nonlinearity by conic reformulation
-
Linear conic programming (LCoP)
--------------------------------------
(1) 原问题在低维空间里不好处理,就升维到高维空间里去处理。
-
最常见的就是用kernel-based linear soft SVM来做分类问题。原来不是线性可分的data-points放到高维空间里就可以用hyperplane来分割。
-
还有个ADMM的东西,是做Machine learning常用的算法,核心就在于reformulation.
-
多值变量用一大堆0-1变量来reformulate
(2) 离散的东西用连续的来重构,discrete formulation by continuous reformulation,例如
。但是要注意,这个转变涉及到从linear变成quadratic了。
(3) 非凸的东西用一堆凸函数的差来重构
(4) Matrix requirement by linear matrix inequality reformulation
这个似乎是SDP的内容
(5) Nonlinearity by conic reformulation
这个是linear conic programming的case,似乎是可以LCoP可以重构成SOCP.
----------------------------------
3. Optimality by relaxation
(1) Integer relaxation
-
Total uni-modularity
-
Replace {0, 1} requirement by [0, 1] interval
(2) Lagrangian relaxation
-
Integrate constraints with multipliers to the objective
(3) Non-convexity relaxation
-
Drop rank-one requirement in SDP formulation.
-------------------------
(1) 就是最常见的整数规划的线性松弛
(2) 拉格朗日松弛基本上可以算是所有高级算法的入门了,由此衍生出来一大堆东西。
(3) 把一个非凸的东西松弛掉,变成一个凸优化的问题。
========================
总结,这个ARR的思想,在optimization, data science中非常非常重要,指导着我们处理问题的方法。
Principle of ARR for optimization
-
Approximation– “accept good-enough” to make the problem easy.
-
Reformulation– “change view” to make the problem clear and simple.
-
Relaxation– “drop difficulties” to get a bound of the problem.
另外,之前有提到过,
-
做machine learning, 做data science, data analytics,其实并不是真正地在做optimization, 核心的差别在于data analytics的更关注generalizability, 泛化能力,是更希望未来能做好,针对未来的数据,能做好预测、做好分类等等。而optimization是当下做到最好。
-
optimization更多的是mathematical solution, 而data science更多是engineering solution. Engineering的东西就需要反复动手,去调试,去做computational experiments, 去得到一个“表现不错”的解决方案,其实往往没有办法证明这就是个“最好的”方案。而且这个“表现不错的”方案往往与数据本身有关。
-
Machine learning中用到的optimization的知识最关键的就是linear programming与quadratic programming.