【Pytorch学习笔记】3.深度学习基础


根据龙良曲Pytorch学习视频整理,视频链接:
【计算机-AI】PyTorch学这个就够了!
(好课推荐)深度学习与PyTorch入门实战——主讲人龙良曲

13.梯度

  • 导数 derivative
  • 偏微分 partial derivate
  • 梯度 gradient(向量)

How to search for minima?

  • θ t + 1 = θ t − α t ▽ f ( θ t ) \theta_{t+1}=\theta_t-\alpha_t\triangledown f(\theta_t) θt+1=θtαtf(θt)

Optimizer performance

14.激活函数

  • 连续不可导
  • Sigmoid / Logistic σ ′ = σ ( 1 − σ ) \sigma'=\sigma(1-\sigma) σ=σ(1σ)
    torch.sigmoid()
    F.sigmoid() (import torch.nn.functional as F)
  • Tanh
    torch.tanh()
  • Relu
    torch.relu()
    F.relu() (import torch.nn.functional as F)

Typical Loss

  • Mean Squared Error
    MSE l o s s = ∑ [ y − ( x w + b ) ] 2 loss = \sum [y-(xw+b)]^2 loss=[y(xw+b)]2
    L 2 − n o r m = ∣ ∣ y − ( x w + b ) ∣ ∣ 2 L2-norm=||y-(xw+b)||_2 L2norm=y(xw+b)2
  • Cross Entropy Loss
    binary
    multi-class
    +softmax
    Leave it to Logistic Regression Part
  • Softmax
    soft version of max
    S ( y i ) = e y i ∑ j e y j S(y_i)=\frac{e^{y_i}}{\sum_je^{y_j}} S(yi)=jeyjeyi
    ∂ p i ∂ p j = { p i ( 1 − p i ) i = j − p j ∗ p i i ≠ j \frac{\partial p_i}{\partial p_j}=\left\{\begin{matrix} p_i(1-p_i)&i=j \\ -p_j*p_i& i\neq j \end{matrix}\right. pjpi={pi(1pi)pjpii=ji=j

Gradient API

  • torch.autograd.grad(loss, [w1, w2,...])
  • loss.backward()
import torch
import torch.nn.functional as F

x = torch.ones(1)
w = torch.full([1], 2.)
mse = F.mse_loss(torch.ones(1), x*w)
print(mse)  # tensor(1., grad_fn=<MseLossBackward>)

# torch.autograd.grad(mse, [w]) # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

print(w.requires_grad_())   # tensor([2.], requires_grad=True)

# print(torch.autograd.grad(mse, [w]))    # 动态图未更新会报错RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

mse = F.mse_loss(torch.ones(1), x*w)
# print(torch.autograd.grad(mse, [w]))    # (tensor([2.]),

mse.backward()
print(w.grad)   # tensor([2.])


a = torch.rand(3, requires_grad=True)
print(a)    # tensor([0.0377, 0.4542, 0.1386], requires_grad=True)
p = F.softmax(a, dim=0)
# p.backward()    # 报错 RuntimeError: grad can be implicitly created only for scalar outputs
# retain_graph=True 不会清除计算图
print(torch.autograd.grad(p[0], [a], retain_graph=True))    # (tensor([ 0.1998, -0.1156, -0.0843]),)
print(torch.autograd.grad(p[1], [a], retain_graph=True))    # (tensor([-0.1156,  0.2434, -0.1278]),)
print(torch.autograd.grad(p[2], [a], retain_graph=True))    # (tensor([-0.0843, -0.1278,  0.2121]),)

15.感知机

单一输出感知机求导
P32
∂ E ∂ w j 0 = ( O 0 − t ) O 0 ( 1 − O 0 ) x j 0 \frac{\partial E}{\partial w_{j0}}=(O_0-t)O_0(1-O_0)x^0_j wj0E=(O0t)O0(1O0)xj0

多输出Loss层 (Multi-output Perception)
在这里插入图片描述
∂ E ∂ w j k = ( O k − t k ) O k ( 1 − O k ) x j 0 \frac{\partial E}{\partial w_{jk}}=(O_k-t_k)O_k(1-O_k)x^0_j wjkE=(Oktk)Ok(1Ok)xj0

import torch
import torch.nn.functional as F

x = torch.randn(1, 10)
# w = torch.randn(1, 10, requires_grad=True)  # 单一层感知机
w = torch.randn(2, 10, requires_grad=True)  # 多输出Loss层
o = torch.sigmoid(x@w.t())
print(o.shape)  # torch.Size([1, 2])

loss = F.mse_loss(torch.ones(1, 1), o)  # broadcasting
print(loss.shape)   # torch.Size([])
print(loss)   # tensor(0.2094, grad_fn=<MseLossBackward>)

loss.backward()
print(w.grad)
"""
tensor([[-2.0498e-01,  2.4619e-02, -8.0208e-04, -1.3723e-01, -1.3014e-01,
         -1.4648e-01, -7.5119e-02,  4.9381e-02,  2.7161e-01,  4.8075e-02],
        [-4.8705e-03,  5.8495e-04, -1.9058e-05, -3.2607e-03, -3.0922e-03,
         -3.4804e-03, -1.7849e-03,  1.1733e-03,  6.4536e-03,  1.1423e-03]])
"""

16.链式法则

import torch
import torch.nn.functional as F

x = torch.tensor(1.)
w1 = torch.tensor(2., requires_grad=True)
b1 = torch.tensor(1.)
w2 = torch.tensor(2., requires_grad=True)
b2 = torch.tensor(1.)
y1 = x * w1 + b1
y2 = y1 * w2 + b2

dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]

print(dy2_dy1 * dy1_dw1)    # tensor(2.)
print(dy2_dw1)  # tensor(2.)

17.反向传播

For an output layer node k ∈ \in K ∂ E ∂ W j k = O j δ k \frac{\partial E}{\partial W_{jk}}=O_j\delta_k WjkE=Ojδk

where δ k = O k ( 1 − O k ) ( O k − t k ) \delta _k = O_k(1-O_k)(O_k-t_k) δk=Ok(1Ok)(Oktk)

For a hidden layer node j ∈ \in J ∂ E ∂ W i j = O i δ j \frac{\partial E}{\partial W_{ij}}=O_i\delta_j WijE=Oiδj

where δ j = O j ( 1 − O j ) ∑ k ∈ K δ k W j k \delta _j = O_j(1-O_j)\sum _{k \in K}\delta _k W_{jk} δj=Oj(1Oj)kKδkWjk

18.2D函数优化实例

import torch
import numpy as np
import matplotlib.pyplot as plt

def himmelblau(x):
    return (x[0] ** 2 + x[1] - 11) ** 2 + (x[0] + x[1] ** 2 - 7) ** 2

x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
print('x, y range:', x.shape, y.shape)
X, Y = np.meshgrid(x, y)
print('X, Y maps', X.shape, Y.shape)
Z = himmelblau([X, Y])

fig = plt.figure('himmelblau')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z)
ax.view_init(60, -30)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

x = torch.tensor([0., 0.], requires_grad=True)
optimizer = torch.optim.Adam([x], lr=1e-3)
for step in range(20000):
    pred = himmelblau(x)
    optimizer.zero_grad()
    pred.backward()
    optimizer.step()
    if step % 2000 == 0:
        print('step {}: x = {}, f(x) = {}'.format(step, x.tolist(), pred.item()))
"""
x, y range: (120,) (120,)
X, Y maps (120, 120) (120, 120)
G:/Project/PYTHON/Demo/Pytorch21_7_29/12himmelblau.py:17: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = fig.gca(projection='3d')
step 0: x = [0.0009999999310821295, 0.0009999999310821295], f(x) = 170.0
step 2000: x = [2.3331806659698486, 1.9540694952011108], f(x) = 13.730916023254395
step 4000: x = [2.9820079803466797, 2.0270984172821045], f(x) = 0.014858869835734367
step 6000: x = [2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08
step 8000: x = [2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09
step 10000: x = [2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10
step 12000: x = [2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11
step 14000: x = [2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12
step 16000: x = [3.0, 2.0], f(x) = 0.0
step 18000: x = [3.0, 2.0], f(x) = 0.0
"""

在这里插入图片描述
不同的初始化会产生不同的解

19.Logistic Regression

Goal v.s. Approach

  • For regression
    Goal: p r e d = y pred = y pred=y
    Approach: minimize d i s t ( p r e d , y ) dist(pred, y) dist(pred,y)
  • For classification
    Goal: maximize benchmark, e.g. accuracy
    Approach1: minimize d i s t ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) dist(p_\theta(y|x), p_r(y|x)) dist(pθ(yx),pr(yx))
    Approach2: minimize d i v e r g e n c e ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) divergence(p_\theta(y|x), p_r(y|x)) divergence(pθ(yx),pr(yx))

Q1. why not maximize accuracy?

  • a c c . = ∑ I ( p r e d i = = y i ) l e n ( Y ) acc.=\frac{\sum I(pred_i==y_i)}{len(Y)} acc.=len(Y)I(predi==yi)
  • issues 1. gradient = 0 if accuracy unchanged but weights changed
  • issues 2. gradient not continuous since the number of correct is not continuous

Q2. why call logistic regression

  • use sigmoid
  • Controversial
    Mse => regression
    Cross Entropy => classification

20.交叉熵

Entropy

  • Uncertainty
  • measure of surprise
  • higher entropy = less info.
    E n t r o p y = − ∑ i P ( i ) l o g P ( i ) Entropy=-\sum_iP(i)logP(i) Entropy=iP(i)logP(i)

Lottery

import torch

a = torch.full([4], 1/4.)
print(a)    # tensor([0.2500, 0.2500, 0.2500, 0.2500])
print(a * torch.log2(a))    # tensor([-0.5000, -0.5000, -0.5000, -0.5000])
print(-(a * torch.log2(a)).sum())   # tensor(2.)

a = torch.tensor([0.1, 0.1, 0.1, 0.7])
print(a)    # tensor([0.1000, 0.1000, 0.1000, 0.7000])
print(a * torch.log2(a))    # tensor([-0.3322, -0.3322, -0.3322, -0.3602])
print(-(a * torch.log2(a)).sum())   #tensor(1.3568)

Cross Entropy

  • H ( p , q ) = ∑ p ( x ) l o g   q ( x ) H(p,q)=\sum p(x)log\space q(x) H(p,q)=p(x)log q(x)
  • H ( p , q ) = H ( p ) + D K L ( p ∣ q ) H(p,q)=H(p)+D_{KL}(p|q) H(p,q)=H(p)+DKL(pq) D K L D_{KL} DKL指KL Divergence相对熵
    P=Q: cross Entropy = Entropy
    for one-hot encoding: entropy = 1log1=0

Binary Classfication
H ( P , Q ) = − ( y l o g ( p ) + ( 1 − y ) l o g ( 1 − p ) ) H(P, Q)=-(ylog(p)+(1-y)log(1-p)) H(P,Q)=(ylog(p)+(1y)log(1p))

Why not use MSE?

  • sigmoid+MSE
    gradient vanish
  • converge slower
  • But, sometimes
    e.g. meta-learning

Numerical Stability

import torch
import torch.nn.functional as F

x = torch.randn(1, 784)
w = torch.randn(10, 784)
logits = x @ w.t()
print(logits.size())    # torch.Size([1, 10])

print(F.cross_entropy(logits, torch.tensor([3])))   # tensor(0.1694)

pred = F.softmax(logits, dim=1)
print(pred.size())    # torch.Size([1, 10])
pred_log = torch.log(pred)
print(F.nll_loss(pred_log, torch.tensor([3])))   # tensor(0.1694)

21.多分类

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10

# load data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)

# Network Architecture
w1, b1 = torch.randn(200, 784, requires_grad=True), torch.zeros(200, requires_grad=True)
w2, b2 = torch.randn(200, 200, requires_grad=True), torch.zeros(200, requires_grad=True)
w3, b3 = torch.randn(10, 200, requires_grad=True), torch.zeros(10, requires_grad=True)

torch.nn.init.kaiming_normal_(w1)
torch.nn.init.kaiming_normal_(w2)
torch.nn.init.kaiming_normal_(w3)

def forward(x):
    x = x @ w1.t() + b1
    x = F.relu(x)
    x = x @ w2.t() + b2
    x = F.relu(x)
    x = x @ w3.t() + b3
    x = F.relu(x)
    return x

# Train
optimizer = torch.optim.SGD([w1, b1, w2, b2, w3, b3], lr=learning_rate)
criteon = nn.CrossEntropyLoss()

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28 * 28)
        logits = forward(data)
        loss = criteon(logits, target)
        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()
            ))

    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data = data.view(-1, 28 * 28)
        logits = forward(data)
        test_loss += criteon(logits, target).item()
        pred = logits.data.max(1)[1]
        correct += pred.eq(target.data).sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
              100. * correct / len(test_loader.dataset)
    ))

"""
Train Epoch:0 [0 / 60000 (0%)]	Loss: 2.486867
Train Epoch:0 [20000 / 60000 (33%)]	Loss: 0.724861
Train Epoch:0 [40000 / 60000 (67%)]	Loss: 0.376032

Test set Average loss:0.0018, Accuracy: 8983 / 10000 (90%)

Train Epoch:1 [0 / 60000 (0%)]	Loss: 0.366988
Train Epoch:1 [20000 / 60000 (33%)]	Loss: 0.377176
Train Epoch:1 [40000 / 60000 (67%)]	Loss: 0.399104

Test set Average loss:0.0014, Accuracy: 9186 / 10000 (92%)

Train Epoch:2 [0 / 60000 (0%)]	Loss: 0.252696
Train Epoch:2 [20000 / 60000 (33%)]	Loss: 0.302346
Train Epoch:2 [40000 / 60000 (67%)]	Loss: 0.266919

Test set Average loss:0.0012, Accuracy: 9284 / 10000 (93%)

Train Epoch:3 [0 / 60000 (0%)]	Loss: 0.320602
Train Epoch:3 [20000 / 60000 (33%)]	Loss: 0.223881
Train Epoch:3 [40000 / 60000 (67%)]	Loss: 0.198832

Test set Average loss:0.0011, Accuracy: 9364 / 10000 (94%)

Train Epoch:4 [0 / 60000 (0%)]	Loss: 0.253680
Train Epoch:4 [20000 / 60000 (33%)]	Loss: 0.147065
Train Epoch:4 [40000 / 60000 (67%)]	Loss: 0.194152

Test set Average loss:0.0010, Accuracy: 9406 / 10000 (94%)

Train Epoch:5 [0 / 60000 (0%)]	Loss: 0.163504
Train Epoch:5 [20000 / 60000 (33%)]	Loss: 0.216691
Train Epoch:5 [40000 / 60000 (67%)]	Loss: 0.166883

Test set Average loss:0.0010, Accuracy: 9460 / 10000 (95%)

Train Epoch:6 [0 / 60000 (0%)]	Loss: 0.120956
Train Epoch:6 [20000 / 60000 (33%)]	Loss: 0.122348
Train Epoch:6 [40000 / 60000 (67%)]	Loss: 0.167381

Test set Average loss:0.0009, Accuracy: 9484 / 10000 (95%)

Train Epoch:7 [0 / 60000 (0%)]	Loss: 0.218382
Train Epoch:7 [20000 / 60000 (33%)]	Loss: 0.141006
Train Epoch:7 [40000 / 60000 (67%)]	Loss: 0.156644

Test set Average loss:0.0009, Accuracy: 9501 / 10000 (95%)

Train Epoch:8 [0 / 60000 (0%)]	Loss: 0.152702
Train Epoch:8 [20000 / 60000 (33%)]	Loss: 0.167587
Train Epoch:8 [40000 / 60000 (67%)]	Loss: 0.182679

Test set Average loss:0.0008, Accuracy: 9528 / 10000 (95%)

Train Epoch:9 [0 / 60000 (0%)]	Loss: 0.210252
Train Epoch:9 [20000 / 60000 (33%)]	Loss: 0.150022
Train Epoch:9 [40000 / 60000 (67%)]	Loss: 0.097077

Test set Average loss:0.0008, Accuracy: 9559 / 10000 (96%)

"""

22.全连接层

concisely

  • inherit from nn.Module
  • init layer in __init__
  • implement forward()

nn.Relu v.s. F.relu()

  • class-style API
  • function-style API
import torch
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim
from visdom import Visdom

# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10

# load data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)


class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()

        self.model = nn.Sequential(
            nn.Linear(784, 200),
            nn.LeakyReLU(inplace=True), # inplace计算可以节省内(显)存,还可省去反复申请和释放内存时间,但会对原变量覆盖
            nn.Linear(200,200),
            nn.LeakyReLU(inplace=True),
            nn.Linear(200,10),
            nn.LeakyReLU(inplace=True),
        )

    def forward(self, x):
        x = self.model(x)
        return x

device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
criteon = nn.CrossEntropyLoss().to(device)

viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.', legend=['loss', 'acc.']))
global_step = -1

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28 * 28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        loss = criteon(logits, target)
        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()
            ))

        # lines: single trace
        global_step += 1
        viz.line([loss.item()], [global_step], win='train_loss', update='append')

    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data = data.view(-1, 28 * 28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        test_loss += criteon(logits, target).item()
        # pred = logits.data.max(1)[1]
        pred = logits.argmax(dim=1)
        correct += pred.eq(target).float().sum().item()

    # lines: multi-traces
    viz.line([[test_loss, correct / len(test_loader.dataset)]], [global_step], win='test', update='append')
    # visual X
    viz.images(data.view(-1, 1, 28, 28), win='x')
    viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))

    test_loss /= len(test_loader.dataset)
    print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
              100. * correct / len(test_loader.dataset)
    ))


"""
Train Epoch:0 [0 / 60000 (0%)]	Loss: 2.302269
Train Epoch:0 [20000 / 60000 (33%)]	Loss: 1.884911
Train Epoch:0 [40000 / 60000 (67%)]	Loss: 1.336271

Test set Average loss:0.0038, Accuracy: 8169 / 10000 (82%)

Train Epoch:1 [0 / 60000 (0%)]	Loss: 0.721071
Train Epoch:1 [20000 / 60000 (33%)]	Loss: 0.565047
Train Epoch:1 [40000 / 60000 (67%)]	Loss: 0.506850

Test set Average loss:0.0021, Accuracy: 8889 / 10000 (89%)

Train Epoch:2 [0 / 60000 (0%)]	Loss: 0.368552
Train Epoch:2 [20000 / 60000 (33%)]	Loss: 0.301212
Train Epoch:2 [40000 / 60000 (67%)]	Loss: 0.406262

Test set Average loss:0.0017, Accuracy: 9061 / 10000 (91%)

Train Epoch:3 [0 / 60000 (0%)]	Loss: 0.372895
Train Epoch:3 [20000 / 60000 (33%)]	Loss: 0.390528
Train Epoch:3 [40000 / 60000 (67%)]	Loss: 0.389583

Test set Average loss:0.0015, Accuracy: 9141 / 10000 (91%)

Train Epoch:4 [0 / 60000 (0%)]	Loss: 0.220136
Train Epoch:4 [20000 / 60000 (33%)]	Loss: 0.281799
Train Epoch:4 [40000 / 60000 (67%)]	Loss: 0.291274

Test set Average loss:0.0014, Accuracy: 9211 / 10000 (92%)

Train Epoch:5 [0 / 60000 (0%)]	Loss: 0.280618
Train Epoch:5 [20000 / 60000 (33%)]	Loss: 0.305418
Train Epoch:5 [40000 / 60000 (67%)]	Loss: 0.334693

Test set Average loss:0.0014, Accuracy: 9226 / 10000 (92%)

Train Epoch:6 [0 / 60000 (0%)]	Loss: 0.342200
Train Epoch:6 [20000 / 60000 (33%)]	Loss: 0.294665
Train Epoch:6 [40000 / 60000 (67%)]	Loss: 0.220197

Test set Average loss:0.0013, Accuracy: 9280 / 10000 (93%)

Train Epoch:7 [0 / 60000 (0%)]	Loss: 0.211271
Train Epoch:7 [20000 / 60000 (33%)]	Loss: 0.358451
Train Epoch:7 [40000 / 60000 (67%)]	Loss: 0.236865

Test set Average loss:0.0012, Accuracy: 9338 / 10000 (93%)

Train Epoch:8 [0 / 60000 (0%)]	Loss: 0.226759
Train Epoch:8 [20000 / 60000 (33%)]	Loss: 0.288015
Train Epoch:8 [40000 / 60000 (67%)]	Loss: 0.263826

Test set Average loss:0.0012, Accuracy: 9349 / 10000 (93%)

Train Epoch:9 [0 / 60000 (0%)]	Loss: 0.144617
Train Epoch:9 [20000 / 60000 (33%)]	Loss: 0.166465
Train Epoch:9 [40000 / 60000 (67%)]	Loss: 0.296551

Test set Average loss:0.0011, Accuracy: 9367 / 10000 (94%)
"""

23.激活函数与GPU加速

大部分时候使用ReLU激活函数

  • ReLU: R ( z ) = m a x ( 0 , z ) R(z)=max(0, z) R(z)=max(0,z)
  • Leaky ReLU
  • SELU
  • softplus

GPU accelerated

  • 一键切换
    device=torch.device('cuda: 0') data.to(device)
    data.cuda() 不推荐

任务管理器查看GPU使用情况
在这里插入图片描述

24. 测试

Loss != Accuracy

When to test

  • test once per serveral batch
  • test once per epoch
  • epoch v.s. step?
  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值