second derivative & Hessian matrix

We are also sometimes interested in a derivative of a derivative. This is known as a second derivative. For example, 2xixjf is the derivative with respect to xi of the derivative of f with respect to xj. Note that the order of derivativation can be swapped, so that 2xixjf=2xjxif . In a single dimension, we can denote d2dx2f by f′′(x) .

The second derivative tells us how the first derivative will change as we vary the input. This means it can be useful for determining whether a critical point is a local maximum , a local minimum, or saddle point. Recall that on a critical point, f(x)=0 . When f′′(x)>0 , this means that f(x) increases as we move to the right, and f(x) decreases as we move to the left. This means f(xϵ)<0 and f(x+ϵ)>0 for small enough ϵ . In other words, as we move right, the slope begins to point uphill to the right, and as we move left, the slope begins to point uphill to the left. Thus, when f(x)=0 and f′′(x)>0 , we can conclude that x is a local minimum. Similarly, when f(x)=0 and f′′(x)<0 , we can conclude that x is a local local maximum. This is known as the second derivative test. Unfortunately, when f′′(x)=0, the test is inconclusive. In this case x may be a saddle point, or a part of a flat region.

In multiple dimensions, we need to examine all of the second derivatives of the function. These derivatives can be collected together into a matrix called the Hessian matrix. The Hessian matrix H(f)(x) is defined such that

H(f)(x)i,j=2xixjf(x).

Equivalently, the Hessian is the Jacobian of the gradient.

Anywhere that the second partial derivatives are continuous, the differential operators are commutative:

2xixjf(x)=2xjxif(x)

This implies that hi,j=hj,i , so the Hessian matrix is symmetric at such points (which includes nearly all inputs to nearly all functions we encounter in deep learning). Because the Hessian matrix is real and symmetric, we can decompose it into a set of real eigenvalues and an orthogonal basis of eigenvectors.
Using the eigendecomposition Hessian matrix, we can generalize the second derivative test to multiple dimensions. At a critical point, where xf(x)=0 , we can examine the eigenvalues of the Hessian to determine whether the critical point is a local maximum, local minimum, or saddle point. When the Hessian is positive definite (all its eigenvalues are positive), the point is a local minimum. This can be seen by observing that the directional second derivative in any direction must be positive, and making reference to the univariate second derivative test. Likewise, when the Hessian is negative definite (all its eigenvalues are negative), the point is a local maximum. In multiple dimensions, it is actually possible to find positive evidence of saddle points in some cases. When at least one eigenvalue is positive and at least on eigenvalue is negative, we known that x is a local maximum on one across section of f but a local minimum on another cross section. Finally, the multidimensional second derivative test can be inconclusive, just like the univariate version. The test is inconclusive whenever all of the non-zero eigenvalues have the same sign, but at least one eigenvalue is zero. This is because the univariate second derivative test is inconclusive in the cross section corresponding to the zero eigenvalue.

The Hessian can also be useful for understanding the performance of gradient descent. When the Hessian has a poor condition number, gradient descent performs poorly. This is because in one direction, the derivative increases rapidly, while in another direction, it increases slowly. Gradient descent is unaware of this change in the derivative so it does not know that it needs to explore preferentially in the direction where the derivative remains negative for longer.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值