什么是梯度下降法?
高等数学告诉我们,梯度方向表示函数增长速度最快的方向,那么他的相反方向就是函数减少速度最快的方向。对于机器学习模型优化的问题,当我们需要求解最小值的时候,朝着梯度下降的方向走,就能找到最优值。
凸函数
例如求解下面这个函数的最小值:
# -*- coding: utf-8 -*-
"""
Created on Fri Dec 23 09:05:43 2016
@author: CrazyVertigo
"""
import numpy as np
import matplotlib.pyplot as plt
def gd(x_start,step,g):
x =x_start
for i in range(50):
grad = g(x)
x -= grad*step
print '[Epoch {0}] grad = {1},x = {2}'.format(i,grad,x)
if abs(grad)<1e-3:
break;
return x
def f(x):
return x*x-2*x+1
def g(x):
return 2*x -2
x = np.linspace(-9,10,100)
y = f(x)
plt.plot(x,y)
gd(5,0.1,g)
非凸函数
对于非凸函数,当有两个极值的时候,梯度下降法就死翘翘了,很容易陷入局部极小值
# -*- coding: utf-8 -*-
"""
Created on Fri Dec 23 09:05:43 2016
@author: HDU
"""
import numpy as np
import matplotlib.pyplot as plt
def gd(x_start,step,g):
x =x_start
for i in range(100):
grad = g(x)
x -= grad*step
print '[Epoch {0}] grad = {1},x = {2}'.format(i,grad,x)
if abs(grad)<1e-3:
break;
return x
def f(x):
l = x.size
new = np.zeros(l)
for i in range(l):
if x[i] > 0:
new[i] = (x[i]-2)*(x[i]-2)
else:
new[i] = (x[i]+2)*(x[i]+2)+1
return new
def g(x):
if x>0:
return 2*(x-2)
else:
return 2*(x+2)
x = np.linspace(-5,5,100)
y = f(x)
plt.plot(x,y)
gd(-3,0.1,g)
非凸函数曲线
输出如下:
>
[Epoch 0] grad = -2,x = -2.8
[Epoch 1] grad = -1.6,x = -2.64
[Epoch 2] grad = -1.28,x = -2.512
[Epoch 3] grad = -1.024,x = -2.4096
……
[Epoch 34] grad = -0.00101412048018,x = -2.00040564819
[Epoch 35] grad = -0.000811296384146,x = -2.00032451855
结果:陷入局部最优解
若修改上面最后一行代码:
gd(3,0.1,g)
则输出:
[Epoch 0] grad = 2,x = 2.8
[Epoch 1] grad = 1.6,x = 2.64
[Epoch 2] grad = 1.28,x = 2.512
……
[Epoch 34] grad = 0.00101412048018,x = 2.00040564819
[Epoch 35] grad = 0.000811296384146,x = 2.00032451855
结果:找到全局最优解
参考
https://github.com/hsmyy/zhihuzhuanlan/blob/master/gd.ipynb