PyTorch 深度学习之梯度下降Gradient Descent(二)

河图洛水

已于 2023-09-14 21:36:54 修改

阅读量182

点赞数

分类专栏：深度学习文章标签：深度学习 pytorch python 人工智能

于 2022-03-29 23:14:54 首次发布

本文链接：https://blog.csdn.net/HETUW/article/details/123833951

版权

深度学习专栏收录该内容

11 篇文章 0 订阅

订阅专栏

一.分治思想

缺点:容易找不到全局最优点

二.贪心思想（Gradient Descent）

2.1斜率：

$gradient=\frac{\partial cost}{\partial x}$

2.2 更新:

$w=w-\alpha \frac{\partial cost}{\partial w}$

$\alpha$ 是学习率（步长）：

过大无法收敛
过小，找到局部最优，找不到全局最优

深度学习算法中，并没有过多的局部最优的点，如何解决鞍点是最大的难题

2.3 鞍点：

梯度为0

$g=0$ 导致 $w=w-\alpha g$ 无法继续更新迭代

2.4 Derivation步骤：

$\frac{\partial cost}{\partial w}=\frac{\partial }{\partial w}\frac{1}{N}\sum \left ( x_{n}w-y_{n} \right )^2$

$z=x_{n}w-y_{n}$

$\frac{\partial cost}{\partial w}=\frac{1}{N}\sum\frac{\partial }{\partial w}z^2$

$\frac{\partial cost}{\partial w}=\frac{1}{N}\sum2\frac{\partial \left (x_{n}w-y_{n} \right )}{\partial w} \left ( x_{n}w-y_{n} \right )$

$\frac{\partial cost}{\partial w}=\frac{1}{N}\sum2 \cdot x_{n}\cdot \left ( x_{n}w-y_{n} \right )$

updates:

$w=w-\alpha \frac{1}{N}\sum2 \cdot x_{n}\cdot \left ( x_{n}w-y_{n} \right )$

2.5Codes:

import matplotlib.pyplot as plt
 
# prepare the training set
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
 
# initial guess of weight 
w = 1.0
 
# define the model linear model y = w*x
def forward(x):
    return x*w
 
#define the cost function MSE 
def cost(xs, ys):
    cost = 0
    for x, y in zip(xs,ys):
        y_pred = forward(x)
        cost += (y_pred - y)**2
    return cost / len(xs)
 
# define the gradient function  gd
def gradient(xs,ys):
    grad = 0
    for x, y in zip(xs,ys):
        grad += 2*x*(x*w - y)
    return grad / len(xs)
 
epoch_list = []
cost_list = []
print('predict (before training)', 4, forward(4))
for epoch in range(100):
    cost_val = cost(x_data, y_data)
    grad_val = gradient(x_data, y_data)
    w-= 0.01 * grad_val  # 0.01 learning rate
    print('epoch:', epoch, 'w=', w, 'loss=', cost_val)
    epoch_list.append(epoch)
    cost_list.append(cost_val)
 
print('predict (after training)', 4, forward(4))
plt.plot(epoch_list,cost_list)
plt.ylabel('cost')
plt.xlabel('epoch')
plt.show()

P.S最后结果是收敛的指数加权均值会更平滑

${c_{i}}'=\beta c_{i}+\left ( 1-\beta \right ){c_{i-1}}'$

如果最后是发散的说明训练失败，学习率太大

三.随机梯度下降（Stochastic Gradient Descent）

3.1比较

	梯度下降	随机梯度下降
性能	低	高
时间复杂度	并行效率高低	不并行高

3.2公式

N个随机样本选一个loss的梯度，作为梯度下降的依据，而不用总体所有点的梯度和，作为梯度下降的依据。cost function 所有样本而一个样本有随机噪声, 使得可以跨越鞍点 cost->loss

$w=w-\alpha \frac{\partial loss}{\partial w}$

$\frac{\partial loss}{\partial w}=2 \cdot x_{n}\cdot \left ( x_{n}w-y_{n} \right )$

3.3Codes

import matplotlib.pyplot as plt
 
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
 
w = 1.0
 
def forward(x):
    return x*w
 
# calculate loss function
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y)**2
 
# define the gradient function  sgd
def gradient(x, y):
    return 2*x*(x*w - y)
 
epoch_list = []
loss_list = []
print('predict (before training)', 4, forward(4))
for epoch in range(100):
    for x,y in zip(x_data, y_data):
        grad = gradient(x,y)
        w = w - 0.01*grad    # update weight by every grad of sample of training set
        print("\tgrad:", x, y,grad)
        l = loss(x,y)
    print("progress:",epoch,"w=",w,"loss=",l)
    epoch_list.append(epoch)
    loss_list.append(l)
 
print('predict (after training)', 4, forward(4))
plt.plot(epoch_list,loss_list)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.show()

3.4批量随机梯度下降（Batch）

多个loss作为一组

河图洛水

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
PyTorch 深度学习之梯度下降Gradient Descent(二)

一.分治思想缺点:容易找不到全局最优点二.贪心思想（Gradient Descent）2.1斜率：2.2更新:是学习率（步长）：过大无法收敛过小，找到局部最优，找不到全局最优深度学习算法中，并没有过多的局部最优的点，如何解决鞍点是最大的难题2.3鞍点：导致无法继续更新迭代2.4Derivation步骤：updates:2.5Codes:import matplotlib.pyplot as plt # prep...
复制链接

扫一扫