Little about Optimization

Optimization is quite important and useful in many fields, no one can express the complex mathematical concepts in such a short article. However here I present some basic knowledge about optimization problems, especially the unconstrained optimization. The content of this article is also great aided by Amos Storkey.

Unconstrained Optimization Problems

Unconstrained nonlinear optimization methods can be classified by what information theyuse:

1) If using zeroth order of the function, we can just use basic search approaches, like bracketing.
2) If we use first order of the function, we can use gradient method like steepest descent
3) If we use first order to approximate second order, we can use algorithms like conjugate gradient or Quasi-Newton
4) If we use second order information, we can use Newton-Raphson approach.
5) However, we barely use higher order information as additional computation costs not worth any accuracy gain.


Usually, we have a better result by using a higher order information, why not just use high order methods?

1) Compute high order information may be costly. Suppose we have N dimension data. Then at each point, we have 1 value, N derivatives, and N^2 second order derivatives. Computing these values takes time and storage.
2) Using high order information may be costly. Suppose we are using the inverse of Hessian, it has the computational complexity of O(N^3)

To achieve the benefits of high order information, we usually use low order approximation methods.

Suppose now we want to use the first order to optimize some function, how can we utilize the first-order derivative to guide our function to search the space? The most common approach is to use the gradient of the space to choose a direction, then move along this direction for one step. However, here we might face a problem to choose the length of the step. One algorithm named Line Search takes this problem into consideration.

1) Suppose we are at the position of $\theta_t$
2) We take the first derivative to have a direction $v$ to move in this parameter space
3) Now we move along this space to minimize the cost function

 


It seems that gradient descent search is clever enough to find the optimal solution. However, it has a problem. Sometimes it is jumping in the space, the path it search just is just zig-zagging. So it might occur to us that we need some more information.

Now we have a look at second order information. How can we get get this second order information? We can use the Taylor expansion:



Where the second order information is called Hessian:



If H is positive definite, this models the error surface as a quadratic bowl as show in Figure below

 


Now how can we use this information? We start from the simple problem, suppose we have a quadratic error function:

 



We can have the answer directly:


However, as I have mentioned above, computing Hessian takes O(N^2) and computing the inverse takes O(N^3). Any clever way to do this?

One good solution is conjugate gradients. Suppose we are still given the quadratic function.

1) We can find some basis V = [v_1,v_2,...,v_N] from H such that V'HV is diagonal. Without loss of generality, we can have



2) Now we can express \theta in the new basis



3) Now the function becomes:



Now we have successfully decomposed the original problem into some sub optimal problem. And each direction $v$ is conjugate to others, which means we have good direction to follow without computing the Hessian.

So the conjugate gradients algorithms could be:
1) Pick up one direction which is conjugate with previous ones
2) Optimize in that direction with line search

Usually it will get near the optimum in only a few steps.

Can this approach adaptable for general nonlinear functions? Usually, we can still use this method, but we have a problem that Hessian might not be positive definite. One approach is using scaled additive term to cope with this non-positive definite matrix. And during the search procedure, using a dynamic stepsize instead of line search.

Little Extension


Now we have some basic idea about unconstrained optimization problems. What about the problem is under some constraints?

One basic and very useful idea is, remove the constraints. Suppose we have a constraint that \theta>0. We can set a function:



Now \phi is unconstrained.

Another way is to use constrained optimization methods. Here I will not discuss in details. However these methods can usually be classified into two groups:Linear ProgrammingandQuadratic Programming.

Another question we might talk about, is to find the global optimum. Most algorithms will not guarantee this, one simple approach to deal with local minimum, it to train the optimizer with different initial starts, and find the best.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值