一维梯度下降
证明:沿梯度反方向移动自变量可以减小函数值
泰勒展开:
f(x+ϵ)=f(x)+ϵf′(x)+O(ϵ2)
代入沿梯度方向的移动量 ηf′(x):
f(x−ηf′(x))=f(x)−ηf′2(x)+O(η2f′2(x))
f(x−ηf′(x))≲f(x)
x←x−ηf′(x)
e.g.
f(x)=x2
%matplotlib inline
import numpy as np
import torch
import time
from torch import nn, optim
import math
import sys
sys.path.append('/home/kesci/input')
import d2lzh1981 as d2l
def f(x):
return x**2 # Objective function
def gradf(x):
return 2 * x # Its derivative
def gd(eta):
x = 10
results = [x]
for i in range(10):
x -= eta * gradf(x)
results.append(x)
print('epoch 10, x:', x)
return results
res = gd(0.2)
def show_trace(res):
n = max(abs(min(res)), abs(max(res)))
f_line = np.arange(-n, n, 0.01)
d2l.set_figsize((3.5, 2.5))
d2l.plt.plot(f_line, [f(x) for x in f_line],'-')
d2l.plt.plot(res, [f(x) for x in res],'-o')
d2l.plt.xlabel('x')
d2l.plt.ylabel('f(x)')
学习率
学习率过小的情况
show_trace(gd(0.05))
学习率过大的情况
show_trace(gd(1.1))