【Pytorch学习笔记】3.深度学习基础

最新推荐文章于 2022-09-14 21:23:44 发布

贪钱算法还我头发

最新推荐文章于 2022-09-14 21:23:44 发布

阅读量1k

点赞数 1

分类专栏： AI # Deep Learning 文章标签： pytorch python 深度学习

本文链接：https://blog.csdn.net/keiven_/article/details/119574664

版权

AI 同时被 2 个专栏收录

64 篇文章 6 订阅

订阅专栏

Deep Learning

45 篇文章 12 订阅

订阅专栏

文章目录

13.梯度
14.激活函数
15.感知机
16.链式法则
17.反向传播
18.2D函数优化实例
19.Logistic Regression
20.交叉熵
21.多分类
22.全连接层
23.激活函数与GPU加速
24. 测试

根据龙良曲Pytorch学习视频整理，视频链接：
【计算机-AI】PyTorch学这个就够了！
(好课推荐)深度学习与PyTorch入门实战——主讲人龙良曲

13.梯度

导数 derivative
偏微分 partial derivate
梯度 gradient（向量）

How to search for minima?

$\theta_{t+1}=\theta_t-\alpha_t\triangledown f(\theta_t)$

Optimizer performance

initialization status 何恺明初始化方法
learning rate （learning_rate_decay）
momentum

14.激活函数

连续不可导
Sigmoid / Logistic $\sigma'=\sigma(1-\sigma)$
torch.sigmoid()
F.sigmoid() (import torch.nn.functional as F)
Tanh
torch.tanh()
Relu
torch.relu()
F.relu() (import torch.nn.functional as F)

Typical Loss

Mean Squared Error
MSE $\sum [y-(xw+b)]^2$
$L2-norm=||y-(xw+b)||_2$
Cross Entropy Loss
binary
multi-class
+softmax
Leave it to Logistic Regression Part
Softmax
soft version of max
$S(y_i)=\frac{e^{y_i}}{\sum_je^{y_j}}$
$\frac{\partial p_i}{\partial p_j}=\left\{\begin{matrix} p_i(1-p_i)&i=j \\ -p_j*p_i& i\neq j \end{matrix}\right.$

Gradient API

torch.autograd.grad(loss, [w1, w2,...])
loss.backward()

import torch
import torch.nn.functional as F

x = torch.ones(1)
w = torch.full([1], 2.)
mse = F.mse_loss(torch.ones(1), x*w)
print(mse)  # tensor(1., grad_fn=<MseLossBackward>)

# torch.autograd.grad(mse, [w]) # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

print(w.requires_grad_())   # tensor([2.], requires_grad=True)

# print(torch.autograd.grad(mse, [w]))    # 动态图未更新会报错RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

mse = F.mse_loss(torch.ones(1), x*w)
# print(torch.autograd.grad(mse, [w]))    # (tensor([2.]),

mse.backward()
print(w.grad)   # tensor([2.])


a = torch.rand(3, requires_grad=True)
print(a)    # tensor([0.0377, 0.4542, 0.1386], requires_grad=True)
p = F.softmax(a, dim=0)
# p.backward()    # 报错 RuntimeError: grad can be implicitly created only for scalar outputs
# retain_graph=True 不会清除计算图
print(torch.autograd.grad(p[0], [a], retain_graph=True))    # (tensor([ 0.1998, -0.1156, -0.0843]),)
print(torch.autograd.grad(p[1], [a], retain_graph=True))    # (tensor([-0.1156,  0.2434, -0.1278]),)
print(torch.autograd.grad(p[2], [a], retain_graph=True))    # (tensor([-0.0843, -0.1278,  0.2121]),)

15.感知机

单一输出感知机求导
P32
$\frac{\partial E}{\partial w_{j0}}=(O_0-t)O_0(1-O_0)x^0_j$

多输出Loss层 (Multi-output Perception)
在这里插入图片描述
$\frac{\partial E}{\partial w_{jk}}=(O_k-t_k)O_k(1-O_k)x^0_j$

import torch
import torch.nn.functional as F

x = torch.randn(1, 10)
# w = torch.randn(1, 10, requires_grad=True)  # 单一层感知机
w = torch.randn(2, 10, requires_grad=True)  # 多输出Loss层
o = torch.sigmoid(x@w.t())
print(o.shape)  # torch.Size([1, 2])

loss = F.mse_loss(torch.ones(1, 1), o)  # broadcasting
print(loss.shape)   # torch.Size([])
print(loss)   # tensor(0.2094, grad_fn=<MseLossBackward>)

loss.backward()
print(w.grad)
"""
tensor([[-2.0498e-01,  2.4619e-02, -8.0208e-04, -1.3723e-01, -1.3014e-01,
         -1.4648e-01, -7.5119e-02,  4.9381e-02,  2.7161e-01,  4.8075e-02],
        [-4.8705e-03,  5.8495e-04, -1.9058e-05, -3.2607e-03, -3.0922e-03,
         -3.4804e-03, -1.7849e-03,  1.1733e-03,  6.4536e-03,  1.1423e-03]])
"""

16.链式法则

import torch
import torch.nn.functional as F

x = torch.tensor(1.)
w1 = torch.tensor(2., requires_grad=True)
b1 = torch.tensor(1.)
w2 = torch.tensor(2., requires_grad=True)
b2 = torch.tensor(1.)
y1 = x * w1 + b1
y2 = y1 * w2 + b2

dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]

print(dy2_dy1 * dy1_dw1)    # tensor(2.)
print(dy2_dw1)  # tensor(2.)

17.反向传播

For an output layer node k $\in$ K $\frac{\partial E}{\partial W_{jk}}=O_j\delta_k$

where $\delta _k = O_k(1-O_k)(O_k-t_k)$

For a hidden layer node j $\in$ J $\frac{\partial E}{\partial W_{ij}}=O_i\delta_j$

where $\delta _j = O_j(1-O_j)\sum _{k \in K}\delta _k W_{jk}$

18.2D函数优化实例

import torch
import numpy as np
import matplotlib.pyplot as plt

def himmelblau(x):
    return (x[0] ** 2 + x[1] - 11) ** 2 + (x[0] + x[1] ** 2 - 7) ** 2

x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
print('x, y range:', x.shape, y.shape)
X, Y = np.meshgrid(x, y)
print('X, Y maps', X.shape, Y.shape)
Z = himmelblau([X, Y])

fig = plt.figure('himmelblau')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z)
ax.view_init(60, -30)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

x = torch.tensor([0., 0.], requires_grad=True)
optimizer = torch.optim.Adam([x], lr=1e-3)
for step in range(20000):
    pred = himmelblau(x)
    optimizer.zero_grad()
    pred.backward()
    optimizer.step()
    if step % 2000 == 0:
        print('step {}: x = {}, f(x) = {}'.format(step, x.tolist(), pred.item()))
"""
x, y range: (120,) (120,)
X, Y maps (120, 120) (120, 120)
G:/Project/PYTHON/Demo/Pytorch21_7_29/12himmelblau.py:17: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = fig.gca(projection='3d')
step 0: x = [0.0009999999310821295, 0.0009999999310821295], f(x) = 170.0
step 2000: x = [2.3331806659698486, 1.9540694952011108], f(x) = 13.730916023254395
step 4000: x = [2.9820079803466797, 2.0270984172821045], f(x) = 0.014858869835734367
step 6000: x = [2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08
step 8000: x = [2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09
step 10000: x = [2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10
step 12000: x = [2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11
step 14000: x = [2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12
step 16000: x = [3.0, 2.0], f(x) = 0.0
step 18000: x = [3.0, 2.0], f(x) = 0.0
"""

在这里插入图片描述
不同的初始化会产生不同的解

19.Logistic Regression

Goal v.s. Approach

For regression
Goal: $p r e d = y$
Approach: minimize $d i s t (p r e d, y)$
For classification
Goal: maximize benchmark, e.g. accuracy
Approach1: minimize $dist(p_\theta(y|x), p_r(y|x))$
Approach2: minimize $divergence(p_\theta(y|x), p_r(y|x))$

Q1. why not maximize accuracy?

$acc.=\frac{\sum I(pred_i==y_i)}{len(Y)}$
issues 1. gradient = 0 if accuracy unchanged but weights changed
issues 2. gradient not continuous since the number of correct is not continuous

Q2. why call logistic regression

use sigmoid
Controversial
Mse => regression
Cross Entropy => classification

20.交叉熵

Entropy

Uncertainty
measure of surprise
higher entropy = less info.
$Entropy=-\sum_iP(i)logP(i)$

Lottery

import torch

a = torch.full([4], 1/4.)
print(a)    # tensor([0.2500, 0.2500, 0.2500, 0.2500])
print(a * torch.log2(a))    # tensor([-0.5000, -0.5000, -0.5000, -0.5000])
print(-(a * torch.log2(a)).sum())   # tensor(2.)

a = torch.tensor([0.1, 0.1, 0.1, 0.7])
print(a)    # tensor([0.1000, 0.1000, 0.1000, 0.7000])
print(a * torch.log2(a))    # tensor([-0.3322, -0.3322, -0.3322, -0.3602])
print(-(a * torch.log2(a)).sum())   #tensor(1.3568)

Cross Entropy

$H(p,q)=\sum p(x)log\space q(x)$
$H(p,q)=H(p)+D_{KL}(p|q)$ $D_{KL}$ 指KL Divergence相对熵
P=Q: cross Entropy = Entropy
for one-hot encoding: entropy = 1log1=0

Binary Classfication
$H (P, Q) = - (y l o g (p) + (1 - y) l o g (1 - p))$

Why not use MSE?

sigmoid+MSE
gradient vanish
converge slower
But, sometimes
e.g. meta-learning

Numerical Stability

import torch
import torch.nn.functional as F

x = torch.randn(1, 784)
w = torch.randn(10, 784)
logits = x @ w.t()
print(logits.size())    # torch.Size([1, 10])

print(F.cross_entropy(logits, torch.tensor([3])))   # tensor(0.1694)

pred = F.softmax(logits, dim=1)
print(pred.size())    # torch.Size([1, 10])
pred_log = torch.log(pred)
print(F.nll_loss(pred_log, torch.tensor([3])))   # tensor(0.1694)

21.多分类

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10

# load data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)

# Network Architecture
w1, b1 = torch.randn(200, 784, requires_grad=True), torch.zeros(200, requires_grad=True)
w2, b2 = torch.randn(200, 200, requires_grad=True), torch.zeros(200, requires_grad=True)
w3, b3 = torch.randn(10, 200, requires_grad=True), torch.zeros(10, requires_grad=True)

torch.nn.init.kaiming_normal_(w1)
torch.nn.init.kaiming_normal_(w2)
torch.nn.init.kaiming_normal_(w3)

def forward(x):
    x = x @ w1.t() + b1
    x = F.relu(x)
    x = x @ w2.t() + b2
    x = F.relu(x)
    x = x @ w3.t() + b3
    x = F.relu(x)
    return x

# Train
optimizer = torch.optim.SGD([w1, b1, w2, b2, w3, b3], lr=learning_rate)
criteon = nn.CrossEntropyLoss()

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28 * 28)
        logits = forward(data)
        loss = criteon(logits, target)
        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()
            ))

    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data = data.view(-1, 28 * 28)
        logits = forward(data)
        test_loss += criteon(logits, target).item()
        pred = logits.data.max(1)[1]
        correct += pred.eq(target.data).sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
              100. * correct / len(test_loader.dataset)
    ))

"""
Train Epoch:0 [0 / 60000 (0%)]	Loss: 2.486867
Train Epoch:0 [20000 / 60000 (33%)]	Loss: 0.724861
Train Epoch:0 [40000 / 60000 (67%)]	Loss: 0.376032

Test set Average loss:0.0018, Accuracy: 8983 / 10000 (90%)

Train Epoch:1 [0 / 60000 (0%)]	Loss: 0.366988
Train Epoch:1 [20000 / 60000 (33%)]	Loss: 0.377176
Train Epoch:1 [40000 / 60000 (67%)]	Loss: 0.399104

Test set Average loss:0.0014, Accuracy: 9186 / 10000 (92%)

Train Epoch:2 [0 / 60000 (0%)]	Loss: 0.252696
Train Epoch:2 [20000 / 60000 (33%)]	Loss: 0.302346
Train Epoch:2 [40000 / 60000 (67%)]	Loss: 0.266919

Test set Average loss:0.0012, Accuracy: 9284 / 10000 (93%)

Train Epoch:3 [0 / 60000 (0%)]	Loss: 0.320602
Train Epoch:3 [20000 / 60000 (33%)]	Loss: 0.223881
Train Epoch:3 [40000 / 60000 (67%)]	Loss: 0.198832

Test set Average loss:0.0011, Accuracy: 9364 / 10000 (94%)

Train Epoch:4 [0 / 60000 (0%)]	Loss: 0.253680
Train Epoch:4 [20000 / 60000 (33%)]	Loss: 0.147065
Train Epoch:4 [40000 / 60000 (67%)]	Loss: 0.194152

Test set Average loss:0.0010, Accuracy: 9406 / 10000 (94%)

Train Epoch:5 [0 / 60000 (0%)]	Loss: 0.163504
Train Epoch:5 [20000 / 60000 (33%)]	Loss: 0.216691
Train Epoch:5 [40000 / 60000 (67%)]	Loss: 0.166883

Test set Average loss:0.0010, Accuracy: 9460 / 10000 (95%)

Train Epoch:6 [0 / 60000 (0%)]	Loss: 0.120956
Train Epoch:6 [20000 / 60000 (33%)]	Loss: 0.122348
Train Epoch:6 [40000 / 60000 (67%)]	Loss: 0.167381

Test set Average loss:0.0009, Accuracy: 9484 / 10000 (95%)

Train Epoch:7 [0 / 60000 (0%)]	Loss: 0.218382
Train Epoch:7 [20000 / 60000 (33%)]	Loss: 0.141006
Train Epoch:7 [40000 / 60000 (67%)]	Loss: 0.156644

Test set Average loss:0.0009, Accuracy: 9501 / 10000 (95%)

Train Epoch:8 [0 / 60000 (0%)]	Loss: 0.152702
Train Epoch:8 [20000 / 60000 (33%)]	Loss: 0.167587
Train Epoch:8 [40000 / 60000 (67%)]	Loss: 0.182679

Test set Average loss:0.0008, Accuracy: 9528 / 10000 (95%)

Train Epoch:9 [0 / 60000 (0%)]	Loss: 0.210252
Train Epoch:9 [20000 / 60000 (33%)]	Loss: 0.150022
Train Epoch:9 [40000 / 60000 (67%)]	Loss: 0.097077

Test set Average loss:0.0008, Accuracy: 9559 / 10000 (96%)

"""

22.全连接层

concisely

inherit from nn.Module
init layer in __init__
implement forward()

nn.Relu v.s. F.relu()

class-style API
function-style API

import torch
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim
from visdom import Visdom

# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10

# load data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)


class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()

        self.model = nn.Sequential(
            nn.Linear(784, 200),
            nn.LeakyReLU(inplace=True), # inplace计算可以节省内（显）存，还可省去反复申请和释放内存时间,但会对原变量覆盖
            nn.Linear(200,200),
            nn.LeakyReLU(inplace=True),
            nn.Linear(200,10),
            nn.LeakyReLU(inplace=True),
        )

    def forward(self, x):
        x = self.model(x)
        return x

device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
criteon = nn.CrossEntropyLoss().to(device)

viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.', legend=['loss', 'acc.']))
global_step = -1

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28 * 28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        loss = criteon(logits, target)
        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()
            ))

        # lines: single trace
        global_step += 1
        viz.line([loss.item()], [global_step], win='train_loss', update='append')

    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data = data.view(-1, 28 * 28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        test_loss += criteon(logits, target).item()
        # pred = logits.data.max(1)[1]
        pred = logits.argmax(dim=1)
        correct += pred.eq(target).float().sum().item()

    # lines: multi-traces
    viz.line([[test_loss, correct / len(test_loader.dataset)]], [global_step], win='test', update='append')
    # visual X
    viz.images(data.view(-1, 1, 28, 28), win='x')
    viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))

    test_loss /= len(test_loader.dataset)
    print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
              100. * correct / len(test_loader.dataset)
    ))


"""
Train Epoch:0 [0 / 60000 (0%)]	Loss: 2.302269
Train Epoch:0 [20000 / 60000 (33%)]	Loss: 1.884911
Train Epoch:0 [40000 / 60000 (67%)]	Loss: 1.336271

Test set Average loss:0.0038, Accuracy: 8169 / 10000 (82%)

Train Epoch:1 [0 / 60000 (0%)]	Loss: 0.721071
Train Epoch:1 [20000 / 60000 (33%)]	Loss: 0.565047
Train Epoch:1 [40000 / 60000 (67%)]	Loss: 0.506850

Test set Average loss:0.0021, Accuracy: 8889 / 10000 (89%)

Train Epoch:2 [0 / 60000 (0%)]	Loss: 0.368552
Train Epoch:2 [20000 / 60000 (33%)]	Loss: 0.301212
Train Epoch:2 [40000 / 60000 (67%)]	Loss: 0.406262

Test set Average loss:0.0017, Accuracy: 9061 / 10000 (91%)

Train Epoch:3 [0 / 60000 (0%)]	Loss: 0.372895
Train Epoch:3 [20000 / 60000 (33%)]	Loss: 0.390528
Train Epoch:3 [40000 / 60000 (67%)]	Loss: 0.389583

Test set Average loss:0.0015, Accuracy: 9141 / 10000 (91%)

Train Epoch:4 [0 / 60000 (0%)]	Loss: 0.220136
Train Epoch:4 [20000 / 60000 (33%)]	Loss: 0.281799
Train Epoch:4 [40000 / 60000 (67%)]	Loss: 0.291274

Test set Average loss:0.0014, Accuracy: 9211 / 10000 (92%)

Train Epoch:5 [0 / 60000 (0%)]	Loss: 0.280618
Train Epoch:5 [20000 / 60000 (33%)]	Loss: 0.305418
Train Epoch:5 [40000 / 60000 (67%)]	Loss: 0.334693

Test set Average loss:0.0014, Accuracy: 9226 / 10000 (92%)

Train Epoch:6 [0 / 60000 (0%)]	Loss: 0.342200
Train Epoch:6 [20000 / 60000 (33%)]	Loss: 0.294665
Train Epoch:6 [40000 / 60000 (67%)]	Loss: 0.220197

Test set Average loss:0.0013, Accuracy: 9280 / 10000 (93%)

Train Epoch:7 [0 / 60000 (0%)]	Loss: 0.211271
Train Epoch:7 [20000 / 60000 (33%)]	Loss: 0.358451
Train Epoch:7 [40000 / 60000 (67%)]	Loss: 0.236865

Test set Average loss:0.0012, Accuracy: 9338 / 10000 (93%)

Train Epoch:8 [0 / 60000 (0%)]	Loss: 0.226759
Train Epoch:8 [20000 / 60000 (33%)]	Loss: 0.288015
Train Epoch:8 [40000 / 60000 (67%)]	Loss: 0.263826

Test set Average loss:0.0012, Accuracy: 9349 / 10000 (93%)

Train Epoch:9 [0 / 60000 (0%)]	Loss: 0.144617
Train Epoch:9 [20000 / 60000 (33%)]	Loss: 0.166465
Train Epoch:9 [40000 / 60000 (67%)]	Loss: 0.296551

Test set Average loss:0.0011, Accuracy: 9367 / 10000 (94%)
"""

23.激活函数与GPU加速

大部分时候使用ReLU激活函数

ReLU: $R (z) = m a x (0, z)$
Leaky ReLU
SELU
softplus

GPU accelerated

一键切换
device=torch.device('cuda: 0') data.to(device)
data.cuda() 不推荐

任务管理器查看GPU使用情况
在这里插入图片描述

24. 测试

Loss != Accuracy

When to test

test once per serveral batch
test once per epoch
epoch v.s. step?

贪钱算法还我头发

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
【Pytorch学习笔记】3.深度学习基础

13.梯度导数 derivative偏微分 partial derivate梯度 gradient（向量）How to search for minima?θt+1=θt−αt▽f(θt)\theta_{t+1}=\theta_t-\alpha_t\triangledown f(\theta_t)θt+1=θt−αt▽f(θt)Optimizer performanceinitialization status 何恺明初始化方法learning rate （learnin
复制链接

扫一扫