NNDL 作业13 优化算法3D可视化

 代码准备:

优化器代码:

class Op(object):
    def __init__(self):
        pass

    def __call__(self, inputs):
        return self.forward(inputs)

    # 输入:张量inputs
    # 输出:张量outputs
    def forward(self, inputs):
        # return outputs
        raise NotImplementedError

    # 输入:最终输出对outputs的梯度outputs_grads
    # 输出:最终输出对inputs的梯度inputs_grads
    def backward(self, outputs_grads):
        # return inputs_grads
        raise NotImplementedError


class Optimizer(object):  # 优化器基类
    def __init__(self, init_lr, model):
        """
        优化器类初始化
        """
        # 初始化学习率,用于参数更新的计算
        self.init_lr = init_lr
        # 指定优化器需要优化的模型
        self.model = model

    def step(self):
        """
        定义每次迭代如何更新参数
        """
        pass


class SimpleBatchGD(Optimizer):
    def __init__(self, init_lr, model):
        super(SimpleBatchGD, self).__init__(init_lr=init_lr, model=model)

    def step(self):
        # 参数更新
        if isinstance(self.model.params, dict):
            for key in self.model.params.keys():
                self.model.params[key] = self.model.params[key] - self.init_lr * self.model.grads[key]


class Adagrad(Optimizer):
    def __init__(self, init_lr, model, epsilon):
        """
        Adagrad 优化器初始化
        输入:
            - init_lr: 初始学习率 - model:模型,model.params存储模型参数值  - epsilon:保持数值稳定性而设置的非常小的常数
        """
        super(Adagrad, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.epsilon = epsilon

    def adagrad(self, x, gradient_x, G, init_lr):
        """
        adagrad算法更新参数,G为参数梯度平方的累计值。
        """
        G += gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """
        参数更新
        """
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.adagrad(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class RMSprop(Optimizer):
    def __init__(self, init_lr, model, beta, epsilon):
        """
        RMSprop优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta:衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(RMSprop, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.beta = beta
        self.epsilon = epsilon

    def rmsprop(self, x, gradient_x, G, init_lr):
        """
        rmsprop算法更新参数,G为迭代梯度平方的加权移动平均
        """
        G = self.beta * G + (1 - self.beta) * gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.rmsprop(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class Momentum(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Momentum优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Momentum, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def momentum(self, x, gradient_x, delta_x, init_lr):
        """
        momentum算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.momentum(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Adam(Optimizer):
    def __init__(self, init_lr, model, beta1, beta2, epsilon):
        """
        Adam优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta1, beta2:移动平均的衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(Adam, self).__init__(init_lr=init_lr, model=model)
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.M, self.G = {}, {}
        for key in self.model.params.keys():
            self.M[key] = 0
            self.G[key] = 0
        self.t = 1

    def adam(self, x, gradient_x, G, M, t, init_lr):
        """
        adam算法更新参数
        输入:
            - x:参数
            - G:梯度平方的加权移动平均
            - M:梯度的加权移动平均
            - t:迭代次数
            - init_lr:初始学习率
        """
        M = self.beta1 * M + (1 - self.beta1) * gradient_x
        G = self.beta2 * G + (1 - self.beta2) * gradient_x ** 2
        M_hat = M / (1 - self.beta1 ** t)
        G_hat = G / (1 - self.beta2 ** t)
        t += 1
        x -= init_lr / torch.sqrt(G_hat + self.epsilon) * M_hat
        return x, G, M, t

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key], self.M[key], self.t = self.adam(self.model.params[key],
                                                                                 self.model.grads[key],
                                                                                 self.G[key],
                                                                                 self.M[key],
                                                                                 self.t,
                                                                                 self.init_lr)

放函数的地方: 

x[0]^{2}+x[1]^{2}+x[1]^{3}+x[0]*x[1]

class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}

    def forward(self, x):
        self.params['x'] = x
        return x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]

    def backward(self):
        x = self.params['x']
        gradient1 = 2 * x[0] + x[1]
        gradient2 = 2 * x[1] + 3 * x[1] ** 2 + x[0]
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])

 x[0]^{2}/20+x[1]^{2}

class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}
 
    def forward(self, x):
        self.params['x'] = x
        return x[0] * x[0] / 20 + x[1] * x[1] / 1
        # return x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]
 
    def backward(self):
        x = self.params['x']
        gradient1 = 2 * x[0] / 20
        gradient2 = 2 * x[1] / 1
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])

- x[0] * x[0] / 2 + x[1] * x[1] / 1:(致敬经典)(鞍点)

class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}
 
    def forward(self, x):
        self.params['x'] = x
        return - x[0] * x[0] / 2 + x[1] * x[1] / 1  # x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]
 
    def backward(self):
        x = self.params['x']
        gradient1 = - 2 * x[0] / 2
        gradient2 = 2 * x[1] / 1
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])

 训练:

def train_f(model, optimizer, x_init, epoch):
    x = x_init
    all_x = []
    losses = []
    for i in range(epoch):
        all_x.append(copy.deepcopy(x.numpy()))  # 浅拷贝 改为 深拷贝, 否则List的原值会被改变。 Edit by David 2022.12.4.
        loss = model(x)
        losses.append(loss)
        model.backward()
        optimizer.step()
        x = model.params['x']
    return torch.Tensor(np.array(all_x)), losses

关于各种超参数(比如初始点和初始学习率)的不同选择和绘图方式(可以关注一下)会使结果会有很大的差异,所以具体的代码细节会在接下来的问题中给出来:

比如初始点:

x[0]^{2}+x[1]^{2}+x[1]^{3}+x[0]*x[1]

 x_init = torch.FloatTensor([2, 3])

x[0]^{2}/20+x[1]^{2}

x_init = torch.FloatTensor([-7, 2])

- x[0] * x[0] / 2 + x[1] * x[1] / 1:(致敬经典)(鞍点)

 x_init = torch.FloatTensor([0.00001, 0.5])

编程实现优化算法,并3D可视化:

1. 函数3D可视化

分别画出x[0]^{2}+x[1]^{2}+x[1]^{3}+x[0]*x[1]x[0]^{2}/20+x[1]^{2}的3D图

from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from matplotlib import pyplot as plt
import torch
class Op(object):
    def __init__(self):
        pass

    def __call__(self, inputs):
        return self.forward(inputs)

    # 输入:张量inputs
    # 输出:张量outputs
    def forward(self, inputs):
        # return outputs
        raise NotImplementedError

    # 输入:最终输出对outputs的梯度outputs_grads
    # 输出:最终输出对inputs的梯度inputs_grads
    def backward(self, outputs_grads):
        # return inputs_grads
        raise NotImplementedError

# 画出x**2
class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}

    def forward(self, x):
        self.params['x'] = x
        return x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]

    def backward(self):
        x = self.params['x']
        gradient1 = 2 * x[0] + x[1]
        gradient2 = 2 * x[1] + 3 * x[1] ** 2 + x[0]
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])


# 使用numpy.meshgrid生成x1,x2矩阵,矩阵的每一行为[-3, 3],以0.1为间隔的数值
x1 = np.arange(-3, 3, 0.1)
x2 = np.arange(-3, 3, 0.1)
x1, x2 = np.meshgrid(x1, x2)
init_x = torch.Tensor(np.array([x1, x2]))

model = OptimizedFunction3D()

# 绘制 f_3d函数 的 三维图像
fig = plt.figure()
ax = plt.axes(projection='3d')
X = init_x[0].numpy()
Y = init_x[1].numpy()
Z = model(init_x).numpy()
ax.plot_surface(X, Y, Z, cmap='rainbow')

ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f(x1,x2)')
plt.show()


# 画出x * x / 20 + y * y
def func(x, y):
    return x * x / 20 + y * y


def paint_loss_func():
    x = np.linspace(-50, 50, 100)  # x的绘制范围是-50到50,从改区间均匀取100个数
    y = np.linspace(-50, 50, 100)  # y的绘制范围是-50到50,从改区间均匀取100个数

    X, Y = np.meshgrid(x, y)
    Z = func(X, Y)

    fig = plt.figure()  # figsize=(10, 10))
    ax = Axes3D(fig)
    plt.xlabel('x')
    plt.ylabel('y')

    ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='rainbow')
    plt.show()


paint_loss_func()

               x[0]^{2}+x[1]^{2}+x[1]^{3}+x[0]*x[1]                         x[0]^{2}/20+x[1]^{2}

        这里是使用了两种方式定义函数,一种是定义到了类,一种是函数,两者绘图的道理相同,详细请看代码。 

2.加入优化算法,画出轨迹

分别画出x[0]^{2}+x[1]^{2}+x[1]^{3}+x[0]*x[1]x[0]^{2}/20+x[1]^{2}的3D轨迹图

x[0]^{2}+x[1]^{2}+x[1]^{3}+x[0]*x[1]

from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from matplotlib import pyplot as plt
import torch
class Op(object):
    def __init__(self):
        pass

    def __call__(self, inputs):
        return self.forward(inputs)

    # 输入:张量inputs
    # 输出:张量outputs
    def forward(self, inputs):
        # return outputs
        raise NotImplementedError

    # 输入:最终输出对outputs的梯度outputs_grads
    # 输出:最终输出对inputs的梯度inputs_grads
    def backward(self, outputs_grads):
        # return inputs_grads
        raise NotImplementedError


import torch
import numpy as np
import copy
from matplotlib import pyplot as plt
from matplotlib import animation
from itertools import zip_longest



class Optimizer(object):  # 优化器基类
    def __init__(self, init_lr, model):
        """
        优化器类初始化
        """
        # 初始化学习率,用于参数更新的计算
        self.init_lr = init_lr
        # 指定优化器需要优化的模型
        self.model = model

    def step(self):
        """
        定义每次迭代如何更新参数
        """
        pass


class SimpleBatchGD(Optimizer):
    def __init__(self, init_lr, model):
        super(SimpleBatchGD, self).__init__(init_lr=init_lr, model=model)

    def step(self):
        # 参数更新
        if isinstance(self.model.params, dict):
            for key in self.model.params.keys():
                self.model.params[key] = self.model.params[key] - self.init_lr * self.model.grads[key]


class Adagrad(Optimizer):
    def __init__(self, init_lr, model, epsilon):
        """
        Adagrad 优化器初始化
        输入:
            - init_lr: 初始学习率 - model:模型,model.params存储模型参数值  - epsilon:保持数值稳定性而设置的非常小的常数
        """
        super(Adagrad, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.epsilon = epsilon

    def adagrad(self, x, gradient_x, G, init_lr):
        """
        adagrad算法更新参数,G为参数梯度平方的累计值。
        """
        G += gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """
        参数更新
        """
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.adagrad(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class RMSprop(Optimizer):
    def __init__(self, init_lr, model, beta, epsilon):
        """
        RMSprop优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta:衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(RMSprop, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.beta = beta
        self.epsilon = epsilon

    def rmsprop(self, x, gradient_x, G, init_lr):
        """
        rmsprop算法更新参数,G为迭代梯度平方的加权移动平均
        """
        G = self.beta * G + (1 - self.beta) * gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.rmsprop(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class Momentum(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Momentum优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Momentum, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def momentum(self, x, gradient_x, delta_x, init_lr):
        """
        momentum算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.momentum(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Nesterov(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Nesterov优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Nesterov, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def nesterov(self, x, gradient_x, delta_x, init_lr):
        """
        Nesterov算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x_prev = delta_x
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += -self.rho * delta_x_prev + (1 + self.rho) * delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.nesterov(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Adam(Optimizer):
    def __init__(self, init_lr, model, beta1, beta2, epsilon):
        """
        Adam优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta1, beta2:移动平均的衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(Adam, self).__init__(init_lr=init_lr, model=model)
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.M, self.G = {}, {}
        for key in self.model.params.keys():
            self.M[key] = 0
            self.G[key] = 0
        self.t = 1

    def adam(self, x, gradient_x, G, M, t, init_lr):
        """
        adam算法更新参数
        输入:
            - x:参数
            - G:梯度平方的加权移动平均
            - M:梯度的加权移动平均
            - t:迭代次数
            - init_lr:初始学习率
        """
        M = self.beta1 * M + (1 - self.beta1) * gradient_x
        G = self.beta2 * G + (1 - self.beta2) * gradient_x ** 2
        M_hat = M / (1 - self.beta1 ** t)
        G_hat = G / (1 - self.beta2 ** t)
        t += 1
        x -= init_lr / torch.sqrt(G_hat + self.epsilon) * M_hat
        return x, G, M, t

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key], self.M[key], self.t = self.adam(self.model.params[key],
                                                                                 self.model.grads[key],
                                                                                 self.G[key],
                                                                                 self.M[key],
                                                                                 self.t,
                                                                                 self.init_lr)


class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}

    def forward(self, x):
        self.params['x'] = x
        return x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]

    def backward(self):
        x = self.params['x']
        gradient1 = 2 * x[0] + x[1]
        gradient2 = 2 * x[1] + 3 * x[1] ** 2 + x[0]
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])


class Visualization3D(animation.FuncAnimation):
    """    绘制动态图像,可视化参数更新轨迹    """

    def __init__(self, *xy_values, z_values, labels=[], colors=[], fig, ax, interval=600, blit=True, **kwargs):
        """
        初始化3d可视化类
        输入:
            xy_values:三维中x,y维度的值
            z_values:三维中z维度的值
            labels:每个参数更新轨迹的标签
            colors:每个轨迹的颜色
            interval:帧之间的延迟(以毫秒为单位)
            blit:是否优化绘图
        """
        self.fig = fig
        self.ax = ax
        self.xy_values = xy_values
        self.z_values = z_values

        frames = max(xy_value.shape[0] for xy_value in xy_values)
        self.lines = [ax.plot([], [], [], label=label, color=color, lw=2)[0]
                      for _, label, color in zip_longest(xy_values, labels, colors)]
        super(Visualization3D, self).__init__(fig, self.animate, init_func=self.init_animation, frames=frames,
                                              interval=interval, blit=blit, **kwargs)

    def init_animation(self):
        # 数值初始化
        for line in self.lines:
            line.set_data([], [])
            # line.set_3d_properties(np.asarray([]))  # 源程序中有这一行,加上会报错。 Edit by David 2022.12.4
        return self.lines

    def animate(self, i):
        # 将x,y,z三个数据传入,绘制三维图像
        for line, xy_value, z_value in zip(self.lines, self.xy_values, self.z_values):
            line.set_data(xy_value[:i, 0], xy_value[:i, 1])
            line.set_3d_properties(z_value[:i])
        return self.lines


def train_f(model, optimizer, x_init, epoch):
    x = x_init
    all_x = []
    losses = []
    for i in range(epoch):
        all_x.append(copy.deepcopy(x.numpy()))  # 浅拷贝 改为 深拷贝, 否则List的原值会被改变。 Edit by David 2022.12.4.
        loss = model(x)
        losses.append(loss)
        model.backward()
        optimizer.step()
        x = model.params['x']
    return torch.Tensor(np.array(all_x)), losses


# 构建6个模型,分别配备不同的优化器
model1 = OptimizedFunction3D()
opt_gd = SimpleBatchGD(init_lr=0.01, model=model1)

model2 = OptimizedFunction3D()
opt_adagrad = Adagrad(init_lr=0.5, model=model2, epsilon=1e-7)

model3 = OptimizedFunction3D()
opt_rmsprop = RMSprop(init_lr=0.1, model=model3, beta=0.9, epsilon=1e-7)

model4 = OptimizedFunction3D()
opt_momentum = Momentum(init_lr=0.01, model=model4, rho=0.9)

model5 = OptimizedFunction3D()
opt_adam = Adam(init_lr=0.1, model=model5, beta1=0.9, beta2=0.99, epsilon=1e-7)

model6 = OptimizedFunction3D()
opt_Nesterov = Nesterov(init_lr=0.1, model=model6, rho=0.9)

models = [model1, model2, model3, model4, model5, model6]
opts = [opt_gd, opt_adagrad, opt_rmsprop, opt_momentum, opt_adam, opt_Nesterov]

x_all_opts = []
z_all_opts = []

# 使用不同优化器训练

for model, opt in zip(models, opts):
    x_init = torch.FloatTensor([2, 3])
    x_one_opt, z_one_opt = train_f(model, opt, x_init, 150)  # epoch
    # 保存参数值
    x_all_opts.append(x_one_opt.numpy())
    z_all_opts.append(np.squeeze(z_one_opt))

# 使用numpy.meshgrid生成x1,x2矩阵,矩阵的每一行为[-3, 3],以0.1为间隔的数值
x1 = np.arange(-3, 3, 0.1)
x2 = np.arange(-3, 3, 0.1)
x1, x2 = np.meshgrid(x1, x2)
init_x = torch.Tensor(np.array([x1, x2]))

model = OptimizedFunction3D()

# 绘制 f_3d函数 的 三维图像
fig = plt.figure()
ax = plt.axes(projection='3d')
X = init_x[0].numpy()
Y = init_x[1].numpy()
Z = model(init_x).numpy()  # 改为 model(init_x).numpy() David 2022.12.4
ax.plot_surface(X, Y, Z, cmap='rainbow')

ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f(x1,x2)')

labels = ['SGD', 'AdaGrad', 'RMSprop', 'Momentum', 'Adam', 'Nesterov']
colors = ['#8B0000', '#0000FF', '#000000', '#008B00', '#FF0000']

animator = Visualization3D(*x_all_opts, z_values=z_all_opts, labels=labels, colors=colors, fig=fig, ax=ax)
ax.legend(loc='upper left')

plt.show()
animator.save('animation.gif')  # 效果不好,估计被挡住了…… 有待进一步提高 Edit by David 2022.12.4

 

x[0]^{2}/20+x[1]^{2}

import torch
import numpy as np
import copy
from matplotlib import pyplot as plt
from matplotlib import animation
from itertools import zip_longest
from matplotlib import cm


class Op(object):
    def __init__(self):
        pass

    def __call__(self, inputs):
        return self.forward(inputs)

    # 输入:张量inputs
    # 输出:张量outputs
    def forward(self, inputs):
        # return outputs
        raise NotImplementedError

    # 输入:最终输出对outputs的梯度outputs_grads
    # 输出:最终输出对inputs的梯度inputs_grads
    def backward(self, outputs_grads):
        # return inputs_grads
        raise NotImplementedError


class Optimizer(object):  # 优化器基类
    def __init__(self, init_lr, model):
        """
        优化器类初始化
        """
        # 初始化学习率,用于参数更新的计算
        self.init_lr = init_lr
        # 指定优化器需要优化的模型
        self.model = model

    def step(self):
        """
        定义每次迭代如何更新参数
        """
        pass


class SimpleBatchGD(Optimizer):
    def __init__(self, init_lr, model):
        super(SimpleBatchGD, self).__init__(init_lr=init_lr, model=model)

    def step(self):
        # 参数更新
        if isinstance(self.model.params, dict):
            for key in self.model.params.keys():
                self.model.params[key] = self.model.params[key] - self.init_lr * self.model.grads[key]


class Adagrad(Optimizer):
    def __init__(self, init_lr, model, epsilon):
        """
        Adagrad 优化器初始化
        输入:
            - init_lr: 初始学习率 - model:模型,model.params存储模型参数值  - epsilon:保持数值稳定性而设置的非常小的常数
        """
        super(Adagrad, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.epsilon = epsilon

    def adagrad(self, x, gradient_x, G, init_lr):
        """
        adagrad算法更新参数,G为参数梯度平方的累计值。
        """
        G += gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """
        参数更新
        """
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.adagrad(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class RMSprop(Optimizer):
    def __init__(self, init_lr, model, beta, epsilon):
        """
        RMSprop优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta:衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(RMSprop, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.beta = beta
        self.epsilon = epsilon

    def rmsprop(self, x, gradient_x, G, init_lr):
        """
        rmsprop算法更新参数,G为迭代梯度平方的加权移动平均
        """
        G = self.beta * G + (1 - self.beta) * gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.rmsprop(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class Momentum(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Momentum优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Momentum, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def momentum(self, x, gradient_x, delta_x, init_lr):
        """
        momentum算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.momentum(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Nesterov(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Nesterov优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Nesterov, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def nesterov(self, x, gradient_x, delta_x, init_lr):
        """
        Nesterov算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x_prev = delta_x
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += -self.rho * delta_x_prev + (1 + self.rho) * delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.nesterov(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Adam(Optimizer):
    def __init__(self, init_lr, model, beta1, beta2, epsilon):
        """
        Adam优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta1, beta2:移动平均的衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(Adam, self).__init__(init_lr=init_lr, model=model)
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.M, self.G = {}, {}
        for key in self.model.params.keys():
            self.M[key] = 0
            self.G[key] = 0
        self.t = 1

    def adam(self, x, gradient_x, G, M, t, init_lr):
        """
        adam算法更新参数
        输入:
            - x:参数
            - G:梯度平方的加权移动平均
            - M:梯度的加权移动平均
            - t:迭代次数
            - init_lr:初始学习率
        """
        M = self.beta1 * M + (1 - self.beta1) * gradient_x
        G = self.beta2 * G + (1 - self.beta2) * gradient_x ** 2
        M_hat = M / (1 - self.beta1 ** t)
        G_hat = G / (1 - self.beta2 ** t)
        t += 1
        x -= init_lr / torch.sqrt(G_hat + self.epsilon) * M_hat
        return x, G, M, t

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key], self.M[key], self.t = self.adam(self.model.params[key],
                                                                                 self.model.grads[key],
                                                                                 self.G[key],
                                                                                 self.M[key],
                                                                                 self.t,
                                                                                 self.init_lr)


class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}

    def forward(self, x):
        self.params['x'] = x
        return x[0] * x[0] / 20 + x[1] * x[1] / 1
        # return x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]

    def backward(self):
        x = self.params['x']
        gradient1 = 2 * x[0] / 20
        gradient2 = 2 * x[1] / 1
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])


class Visualization3D(animation.FuncAnimation):
    """    绘制动态图像,可视化参数更新轨迹    """

    def __init__(self, *xy_values, z_values, labels=[], colors=[], fig, ax, interval=100, blit=True, **kwargs):
        """
        初始化3d可视化类
        输入:
            xy_values:三维中x,y维度的值
            z_values:三维中z维度的值
            labels:每个参数更新轨迹的标签
            colors:每个轨迹的颜色
            interval:帧之间的延迟(以毫秒为单位)
            blit:是否优化绘图
        """
        self.fig = fig
        self.ax = ax
        self.xy_values = xy_values
        self.z_values = z_values

        frames = max(xy_value.shape[0] for xy_value in xy_values)

        self.lines = [ax.plot([], [], [], label=label, color=color, lw=2)[0]
                      for _, label, color in zip_longest(xy_values, labels, colors)]
        self.points = [ax.plot([], [], [], color=color, markeredgewidth=1, markeredgecolor='black', marker='o')[0]
                       for _, color in zip_longest(xy_values, colors)]
        # print(self.lines)
        super(Visualization3D, self).__init__(fig, self.animate, init_func=self.init_animation, frames=frames,
                                              interval=interval, blit=blit, **kwargs)

    def init_animation(self):
        # 数值初始化
        for line in self.lines:
            line.set_data_3d([], [], [])
        for point in self.points:
            point.set_data_3d([], [], [])
        return self.points + self.lines

    def animate(self, i):
        # 将x,y,z三个数据传入,绘制三维图像
        for line, xy_value, z_value in zip(self.lines, self.xy_values, self.z_values):
            line.set_data_3d(xy_value[:i, 0], xy_value[:i, 1], z_value[:i])
        for point, xy_value, z_value in zip(self.points, self.xy_values, self.z_values):
            point.set_data_3d(xy_value[i, 0], xy_value[i, 1], z_value[i])
        return self.points + self.lines


def train_f(model, optimizer, x_init, epoch):
    x = x_init
    all_x = []
    losses = []
    for i in range(epoch):
        all_x.append(copy.deepcopy(x.numpy()))  # 浅拷贝 改为 深拷贝, 否则List的原值会被改变。 Edit by David 2022.12.4.
        loss = model(x)
        losses.append(loss)
        model.backward()
        optimizer.step()
        x = model.params['x']
    return torch.Tensor(np.array(all_x)), losses


# 构建6个模型,分别配备不同的优化器
model1 = OptimizedFunction3D()
opt_gd = SimpleBatchGD(init_lr=0.95, model=model1)

model2 = OptimizedFunction3D()
opt_adagrad = Adagrad(init_lr=1.5, model=model2, epsilon=1e-7)

model3 = OptimizedFunction3D()
opt_rmsprop = RMSprop(init_lr=0.05, model=model3, beta=0.9, epsilon=1e-7)

model4 = OptimizedFunction3D()
opt_momentum = Momentum(init_lr=0.1, model=model4, rho=0.9)

model5 = OptimizedFunction3D()
opt_adam = Adam(init_lr=0.3, model=model5, beta1=0.9, beta2=0.99, epsilon=1e-7)

model6 = OptimizedFunction3D()
opt_Nesterov = Nesterov(init_lr=0.1, model=model6, rho=0.9)  # 将 model4 改为 model6

models = [model1, model2, model3, model4, model5, model6]
opts = [opt_gd, opt_adagrad, opt_rmsprop, opt_momentum, opt_adam, opt_Nesterov]

x_all_opts = []
z_all_opts = []

# 使用不同优化器训练
for model, opt in zip(models, opts):
    x_init = torch.FloatTensor([-7, 2])
    x_one_opt, z_one_opt = train_f(model, opt, x_init, 100)  # epoch
    # 保存参数值
    x_all_opts.append(x_one_opt.numpy())
    z_all_opts.append(np.squeeze(z_one_opt))

# 使用numpy.meshgrid生成x1,x2矩阵,矩阵的每一行为[-3, 3],以0.1为间隔的数值
x1 = np.arange(-10, 10, 0.01)
x2 = np.arange(-5, 5, 0.01)
x1, x2 = np.meshgrid(x1, x2)
init_x = torch.Tensor(np.array([x1, x2]))

model = OptimizedFunction3D()

# 绘制 f_3d函数 的 三维图像
fig = plt.figure()
ax = plt.axes(projection='3d')
X = init_x[0].numpy()
Y = init_x[1].numpy()
Z = model(init_x).numpy()  # 改为 model(init_x).numpy() David 2022.12.4
surf = ax.plot_surface(X, Y, Z, edgecolor='grey', cmap=cm.coolwarm)
# fig.colorbar(surf, shrink=0.5, aspect=1)
# ax.set_zlim(-3, 2)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f(x1,x2)')

labels = ['SGD', 'AdaGrad', 'RMSprop', 'Momentum', 'Adam', 'Nesterov']
colors = ['#8B0000', '#0000FF', '#000000', '#008B00', '#FF0000']

animator = Visualization3D(*x_all_opts, z_values=z_all_opts, labels=labels, colors=colors, fig=fig, ax=ax)
ax.legend(loc='upper right')

plt.show()
# animator.save('teaser' + '.gif', writer='imagemagick',fps=10) # 效果不好,估计被挡住了…… 有待进一步提高 Edit by David 2022.12.4

3.复现CS231经典动画
 

Animations that may help your intuitions about the learning process dynamics. 

Left: Contours of a loss surface and time evolution of different optimization algorithms. Notice the "overshooting" behavior of momentum-based methods, which make the optimization look like a ball rolling down the hill. 

Right: A visualization of a saddle point in the optimization landscape, where the curvature along different dimension has different signs (one dimension curves up and another down). Notice that SGD has a very hard time breaking symmetry and gets stuck on the top. Conversely, algorithms such as RMSprop will see very low gradients in the saddle direction. Due to the denominator term in the RMSprop update, this will increase the effective learning rate along this direction, helping RMSProp proceed. 
 

- x[0] * x[0] / 2 + x[1] * x[1] / 1:(致敬经典)(鞍点)

import torch
import numpy as np
import copy
from matplotlib import pyplot as plt
from matplotlib import animation
from itertools import zip_longest
from matplotlib import cm


class Op(object):
    def __init__(self):
        pass

    def __call__(self, inputs):
        return self.forward(inputs)

    # 输入:张量inputs
    # 输出:张量outputs
    def forward(self, inputs):
        # return outputs
        raise NotImplementedError

    # 输入:最终输出对outputs的梯度outputs_grads
    # 输出:最终输出对inputs的梯度inputs_grads
    def backward(self, outputs_grads):
        # return inputs_grads
        raise NotImplementedError


class Optimizer(object):  # 优化器基类
    def __init__(self, init_lr, model):
        """
        优化器类初始化
        """
        # 初始化学习率,用于参数更新的计算
        self.init_lr = init_lr
        # 指定优化器需要优化的模型
        self.model = model

    def step(self):
        """
        定义每次迭代如何更新参数
        """
        pass


class SimpleBatchGD(Optimizer):
    def __init__(self, init_lr, model):
        super(SimpleBatchGD, self).__init__(init_lr=init_lr, model=model)

    def step(self):
        # 参数更新
        if isinstance(self.model.params, dict):
            for key in self.model.params.keys():
                self.model.params[key] = self.model.params[key] - self.init_lr * self.model.grads[key]


class Adagrad(Optimizer):
    def __init__(self, init_lr, model, epsilon):
        """
        Adagrad 优化器初始化
        输入:
            - init_lr: 初始学习率 - model:模型,model.params存储模型参数值  - epsilon:保持数值稳定性而设置的非常小的常数
        """
        super(Adagrad, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.epsilon = epsilon

    def adagrad(self, x, gradient_x, G, init_lr):
        """
        adagrad算法更新参数,G为参数梯度平方的累计值。
        """
        G += gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """
        参数更新
        """
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.adagrad(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class RMSprop(Optimizer):
    def __init__(self, init_lr, model, beta, epsilon):
        """
        RMSprop优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta:衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(RMSprop, self).__init__(init_lr=init_lr, model=model)
        self.G = {}
        for key in self.model.params.keys():
            self.G[key] = 0
        self.beta = beta
        self.epsilon = epsilon

    def rmsprop(self, x, gradient_x, G, init_lr):
        """
        rmsprop算法更新参数,G为迭代梯度平方的加权移动平均
        """
        G = self.beta * G + (1 - self.beta) * gradient_x ** 2
        x -= init_lr / torch.sqrt(G + self.epsilon) * gradient_x
        return x, G

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key] = self.rmsprop(self.model.params[key],
                                                               self.model.grads[key],
                                                               self.G[key],
                                                               self.init_lr)


class Momentum(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Momentum优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Momentum, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def momentum(self, x, gradient_x, delta_x, init_lr):
        """
        momentum算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.momentum(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Nesterov(Optimizer):
    def __init__(self, init_lr, model, rho):
        """
        Nesterov优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - rho:动量因子
        """
        super(Nesterov, self).__init__(init_lr=init_lr, model=model)
        self.delta_x = {}
        for key in self.model.params.keys():
            self.delta_x[key] = 0
        self.rho = rho

    def nesterov(self, x, gradient_x, delta_x, init_lr):
        """
        Nesterov算法更新参数,delta_x为梯度的加权移动平均
        """
        delta_x_prev = delta_x
        delta_x = self.rho * delta_x - init_lr * gradient_x
        x += -self.rho * delta_x_prev + (1 + self.rho) * delta_x
        return x, delta_x

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.delta_x[key] = self.nesterov(self.model.params[key],
                                                                      self.model.grads[key],
                                                                      self.delta_x[key],
                                                                      self.init_lr)


class Adam(Optimizer):
    def __init__(self, init_lr, model, beta1, beta2, epsilon):
        """
        Adam优化器初始化
        输入:
            - init_lr:初始学习率
            - model:模型,model.params存储模型参数值
            - beta1, beta2:移动平均的衰减率
            - epsilon:保持数值稳定性而设置的常数
        """
        super(Adam, self).__init__(init_lr=init_lr, model=model)
        self.beta1 = beta1
        self.beta2 = beta2
        self.epsilon = epsilon
        self.M, self.G = {}, {}
        for key in self.model.params.keys():
            self.M[key] = 0
            self.G[key] = 0
        self.t = 1

    def adam(self, x, gradient_x, G, M, t, init_lr):
        """
        adam算法更新参数
        输入:
            - x:参数
            - G:梯度平方的加权移动平均
            - M:梯度的加权移动平均
            - t:迭代次数
            - init_lr:初始学习率
        """
        M = self.beta1 * M + (1 - self.beta1) * gradient_x
        G = self.beta2 * G + (1 - self.beta2) * gradient_x ** 2
        M_hat = M / (1 - self.beta1 ** t)
        G_hat = G / (1 - self.beta2 ** t)
        t += 1
        x -= init_lr / torch.sqrt(G_hat + self.epsilon) * M_hat
        return x, G, M, t

    def step(self):
        """参数更新"""
        for key in self.model.params.keys():
            self.model.params[key], self.G[key], self.M[key], self.t = self.adam(self.model.params[key],
                                                                                 self.model.grads[key],
                                                                                 self.G[key],
                                                                                 self.M[key],
                                                                                 self.t,
                                                                                 self.init_lr)


class OptimizedFunction3D(Op):
    def __init__(self):
        super(OptimizedFunction3D, self).__init__()
        self.params = {'x': 0}
        self.grads = {'x': 0}

    def forward(self, x):
        self.params['x'] = x
        return - x[0] * x[0] / 2 + x[1] * x[1] / 1  # x[0] ** 2 + x[1] ** 2 + x[1] ** 3 + x[0] * x[1]

    def backward(self):
        x = self.params['x']
        gradient1 = - 2 * x[0] / 2
        gradient2 = 2 * x[1] / 1
        grad1 = torch.Tensor([gradient1])
        grad2 = torch.Tensor([gradient2])
        self.grads['x'] = torch.cat([grad1, grad2])


class Visualization3D(animation.FuncAnimation):
    """    绘制动态图像,可视化参数更新轨迹    """

    def __init__(self, *xy_values, z_values, labels=[], colors=[], fig, ax, interval=100, blit=True, **kwargs):
        """
        初始化3d可视化类
        输入:
            xy_values:三维中x,y维度的值
            z_values:三维中z维度的值
            labels:每个参数更新轨迹的标签
            colors:每个轨迹的颜色
            interval:帧之间的延迟(以毫秒为单位)
            blit:是否优化绘图
        """
        self.fig = fig
        self.ax = ax
        self.xy_values = xy_values
        self.z_values = z_values

        frames = max(xy_value.shape[0] for xy_value in xy_values)

        self.lines = [ax.plot([], [], [], label=label, color=color, lw=2)[0]
                      for _, label, color in zip_longest(xy_values, labels, colors)]
        self.points = [ax.plot([], [], [], color=color, markeredgewidth=1, markeredgecolor='black', marker='o')[0]
                       for _, color in zip_longest(xy_values, colors)]
        # print(self.lines)
        super(Visualization3D, self).__init__(fig, self.animate, init_func=self.init_animation, frames=frames,
                                              interval=interval, blit=blit, **kwargs)

    def init_animation(self):
        # 数值初始化
        for line in self.lines:
            line.set_data_3d([], [], [])
        for point in self.points:
            point.set_data_3d([], [], [])
        return self.points + self.lines

    def animate(self, i):
        # 将x,y,z三个数据传入,绘制三维图像
        for line, xy_value, z_value in zip(self.lines, self.xy_values, self.z_values):
            line.set_data_3d(xy_value[:i, 0], xy_value[:i, 1], z_value[:i])
        for point, xy_value, z_value in zip(self.points, self.xy_values, self.z_values):
            point.set_data_3d(xy_value[i, 0], xy_value[i, 1], z_value[i])
        return self.points + self.lines


def train_f(model, optimizer, x_init, epoch):
    x = x_init
    all_x = []
    losses = []
    for i in range(epoch):
        all_x.append(copy.deepcopy(x.numpy()))  # 浅拷贝 改为 深拷贝, 否则List的原值会被改变。 Edit by David 2022.12.4.
        loss = model(x)
        losses.append(loss)
        model.backward()
        optimizer.step()
        x = model.params['x']
    return torch.Tensor(np.array(all_x)), losses


# 构建5个模型,分别配备不同的优化器
model1 = OptimizedFunction3D()
opt_gd = SimpleBatchGD(init_lr=0.05, model=model1)

model2 = OptimizedFunction3D()
opt_adagrad = Adagrad(init_lr=0.05, model=model2, epsilon=1e-7)

model3 = OptimizedFunction3D()
opt_rmsprop = RMSprop(init_lr=0.05, model=model3, beta=0.9, epsilon=1e-7)

model4 = OptimizedFunction3D()
opt_momentum = Momentum(init_lr=0.05, model=model4, rho=0.9)

model5 = OptimizedFunction3D()
opt_adam = Adam(init_lr=0.05, model=model5, beta1=0.9, beta2=0.99, epsilon=1e-7)

model6 = OptimizedFunction3D()
opt_Nesterov = Nesterov(init_lr=0.1, model=model6, rho=0.9)

models = [model1, model2, model3, model4, model5, model6]
opts = [opt_gd, opt_adagrad, opt_rmsprop, opt_momentum, opt_adam, opt_Nesterov]

x_all_opts = []
z_all_opts = []

# 使用不同优化器训练

for model, opt in zip(models, opts):
    x_init = torch.FloatTensor([0.00001, 0.5])
    x_one_opt, z_one_opt = train_f(model, opt, x_init, 100)  # epoch
    # 保存参数值
    x_all_opts.append(x_one_opt.numpy())
    z_all_opts.append(np.squeeze(z_one_opt))

# 使用numpy.meshgrid生成x1,x2矩阵,矩阵的每一行为[-3, 3],以0.1为间隔的数值
x1 = np.arange(-1, 2, 0.01)
x2 = np.arange(-1, 1, 0.05)
x1, x2 = np.meshgrid(x1, x2)
init_x = torch.Tensor(np.array([x1, x2]))

model = OptimizedFunction3D()

# 绘制 f_3d函数 的 三维图像
fig = plt.figure()
ax = plt.axes(projection='3d')
X = init_x[0].numpy()
Y = init_x[1].numpy()
Z = model(init_x).numpy()  # 改为 model(init_x).numpy() David 2022.12.4
surf = ax.plot_surface(X, Y, Z, edgecolor='grey', cmap=cm.coolwarm)
# fig.colorbar(surf, shrink=0.5, aspect=1)
ax.set_zlim(-3, 2)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f(x1,x2)')

labels = ['SGD', 'AdaGrad', 'RMSprop', 'Momentum', 'Adam', 'Nesterov']
colors = ['#8B0000', '#0000FF', '#000000', '#008B00', '#FF0000']

animator = Visualization3D(*x_all_opts, z_values=z_all_opts, labels=labels, colors=colors, fig=fig, ax=ax)
ax.legend(loc='upper right')

plt.show()
# animator.save('teaser' + '.gif', writer='imagemagick',fps=10) # 效果不好,估计被挡住了…… 有待进一步提高 Edit by David 2022.12.4
# save不好用,不费劲了,安装个软件做gif https://pc.qq.com/detail/13/detail_23913.html

 

4、结合3D动画,用自己的语言,从轨迹、速度等多个角度讲解各个算法优缺点

重温一下自己的描述:

            SGD总是需要经过反反复复的上上下下才可以最终找到谷底。在上述的情况下,SGD的性能非常差,这就导致了模型的训练非常慢。针对该问题,总的来说就是动量法改变的是梯度的方向,为避免y方向走的步子很大而x方向的步子很小的问题,y方向拽着他一点,x方向推着他一点,(这个操作让动量来)去使方向更加偏x一点。Nesterov动量法在迭代过程中能够更偏向理想状态下降路径。关于学习率调整的算法,AdaGrad算法刚开始步子迈的大点(为了快点到最优),越来越往后离最优值越近时要更加谨慎,步子更小点,省得跨过它。上面这个算法可能出现,学习率减少到0,还没到最优值呢就动不了是不行的,所以出现了RMSProp算法,滑动着只记一部分梯度平方,使用逐渐遗忘过去的操作,将做加法时将新的梯度信息看的更重要。这种操作叫做“指数移动平均”。最后关于Adam算法,将梯度缩放和偏置修正相结合,很稳定。

接下来是优缺点的分析:

SGD:

优点简洁

缺点

  1. 超参数学习率比较敏感(过小导致收敛速度过慢,过大又越过极值点)。
  2. 学习率除了敏感,有时还会因其在迭代过程中保持不变,很容易造成算法被卡在鞍点的位置。
  3. 在较平坦的区域,由于梯度接近于0,优化算法会因误判,在还未到达极值点时,就提前结束迭代,陷入局部极小值
Adagrad:

优点:
“之”字形的变动程度衰减,呈现稳定的向最优点收敛,Adagrad算法能够在训练中自动的对学习率进行调整,对于出现频率较低参数采用较大的α更新;相反,对于出现频率较高的参数采用较小的α更新。因此,Adagrad非常适合处理稀疏数据
缺点
在训练的中后期,分母上梯度平方的累加将会越来越大,从而梯度趋近于0,使得训练提前结束

RMSProp:
:可缓解Adagrad算法学习率下降较快的问题;
:可能收敛到局部极小值
这一点可以不用太担心,因为对深层神经网络来说,目标函数具有局部极小值的概率很小:局部极小值意味着在每一个参数维度都是极小,假设在每一个维度极小值出现的概率为p,n个参数,出现局部极小概率为 p^n ,对深层神经网络,n很大,这概率很小

Momentum:

优点

1、与梯度下降相比,下降速度快,因为如果方向是一直下降的,那么速度将是之前梯度的和,所以比仅用当前梯度下降快
2、对于窄长的等梯度图,会减轻梯度下降的震荡程度,因为考虑了当前时刻是考虑了之前的梯度方向,加快收敛

缺点

盲目沿着坡滚,如果能具备先知,快上坡就减速,适应性会更好(NAG)

Adam:

  • Adam的优点主要在于经过偏置校正后,每一次迭代学习率都有个确定范围,使得参数比较平稳。
  • Adam 方法也会比 RMSprop方法收敛的结果要好一些。就速度和方向的正确性、稳定性而言,都是居中。

Adam通常被认为对超参数的选择相当鲁棒,尽管学习率有时需要从建议的默认修改。 

       个人认为,每个算法都有各自的优点,只不过是每个算法有适合自己的场景,并且一些超参数的设置(比如初始学习率等)的不同,其结果也会相差很大,很难做出平等的对比。

根据小遊同学https://blog.csdn.net/qq_62591797/article/details/135191975?spm=1001.2014.3001.5502

        其中天蓝色的是SGD算法、粉红色的是Momentum算法、深蓝色的是Adam、白色的是Adagrad、绿色的是RMSprop,去进一步理解这些算法。 

小遗憾:动图不会插,保存的图片中的小球还在2背面,看不好,之后可以再研究一下。

心得体会:

(抒情版)总结:连续写了半年的CSDN了,确实感触很深,收获的不只是nndl的知识,感觉自己也更爱表达了,也有了更大的探索欲,感觉这是自己收获最大的地方。最后感谢老师的辛苦付出和悉心教导,更要感谢俺的舍友们的帮助,也要感谢同学和学长们的“参考博客”。

今天也正好是元旦,新的一年祝大家元旦快乐!!!

NNDL 作业13 优化算法3D可视化-CSDN博客

【23-24 秋学期】NNDL 作业13 优化算法3D可视化-CSDN博客

梯度下降的可视化解释(Adam,AdaGrad,Momentum,RMSProp)_3D视觉工坊-商业新知 (shangyexinzhi.com)

NNDL 作业12 优化算法2D可视化-CSDN博客

优化器(Optimizer)(SGD、Momentum、AdaGrad、RMSProp、Adam)-CSDN博客

【DL】深度学习优化方法:SGD、SGDM、Adagrad、RMSProp、Adam_深度学习sgdm-CSDN博客

  • 15
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值