PyTorch学习笔记

Python API

目录

一、torch包

1.Tensors

2.Creation Ops

3.Indexing, Slicing, Joining, Mutating Ops

4.Generators

5.Random sampling

二、torch.nn包

1.Containers

2.Convolution Layers

3.Pooling layers

4.Padding Layers

5.Non-linear Activations (weighted sum, nonlinearity)

6.Non-linear Activations (other)

7.Normalization Layers

8.Recurrent Layers

9.Transformer Layers

10.Linear Layers

11.Dropout Layers

12.Sparse Layers

13.Distance Functions

14.Loss Functions

15.Vision Layers

16.DataParallel Layers (multi-GPU, distributed)

17.Utilities

18.Quantized Functions

三、torch.optim包

Algorithms

1. CLASS torch.optim.Optimizer(params, defaults)

2. CLASS torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)

3. CLASS torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0, eps=1e-10)

4. CLASS torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

5. CLASS torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)

6. CLASS torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)

7. CLASS torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

8. CLASS torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)

9. CLASS torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None)

10. CLASS torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)

11. CLASS torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))

12. CLASS torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)

How to adjust learning rate

基于epochs的数目调整学习率的方法:

1. CLASS torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)

2. CLASS torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)

3. CLASS torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)

4. CLASS torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)

5. CLASS torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)

6. CLASS torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)

基于验证集测量结果的调整学习率方法:

1. CLASS torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08, verbose=False)

2. CLASS torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1, verbose=False)

3. CLASS torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1, verbose=False)

4. CLASS torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=-1, verbose=False)

Stochastic Weight Averaging

Constructing averaged models

SWA learning rate schedules

Taking care of batch normalization

Custom averaging strategies

Putting it all together


一、torch包

torch是一个python包,它包含多维张量的数据结构,并且在这些数据结构之上定义了数学操作。此外,它还提供了对张量和任意类型有效序列化的功能,以及其它有用的功能。

它有一个CUDA版本,这能让你在计算能力>=3.0的NIVDIA GPU上运行张量计算。

1.Tensors

2.Creation Ops

3.Indexing, Slicing, Joining, Mutating Ops

4.Generators

5.Random sampling

6. Serialization

7. Parallelism

8. Locally disabling gradient computation

9. Math operations

9.1 Pointwise Ops

9.2 Reduction Ops

9.3 Comparison Ops

9.4 Spectral Ops

9.5 Other Operations

9.6 BLAS and LAPACK Operations

10. Utilities

二、torch.nn包

torch.nn是用于图的基本构建块。

1.Containers

torch.nn.Module

2.Convolution Layers

3.Pooling layers

4.Padding Layers

5.Non-linear Activations (weighted sum, nonlinearity)

6.Non-linear Activations (other)

7.Normalization Layers

8.Recurrent Layers

9.Transformer Layers

10.Linear Layers

11.Dropout Layers

12.Sparse Layers

13.Distance Functions

14.Loss Functions

15.Vision Layers

16.DataParallel Layers (multi-GPU, distributed)

PyTorch Distributed Overview

16.1 nn.DataParallel

     Implements data parallelism at the module level.

16.2 nn.parallel.DistributeDataParallel

     Implements distributed data parallelism that is based on torch.distributed package at the module level. 

17.Utilities

18.Quantized Functions

三、torch.optim包

torch.optim是一个包,它实现了多种优化器算法。它支持大多数常用的方法,而且有丰富的接口,所以未来更加复杂的优化器也可以容易地被整合到里面。

怎样使用一个优化器?

为了使用优化器,你必须构建一个优化器对象,这个优化器对象负责维持当前的状态,并且基于计算的梯度来更新参数。

构建一个优化器:

为了构建一个优化器,你必须有一个可迭代对象,这个可迭代对象包含要优化的参数( 这些参数都应该是 Variable )。然后,你可以指定特定优化器的参数,像learning rate,weight decay 等。

注意:

如果你使用 .cuda() 函数将一个模型移到GPU上,你应该在移到GPU上之后再构建一个优化器。因为使用 .cuda() 函数之后,模型的参数跟之前的是不一样的(之前的参数是在cpu上)。

通常,当你构建和使用优化器的时候,你应该确保需要优化的参数是在固定的位置。

Example:

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr=0.0001)

Per-parameter options

优化器也支持特定的逐参数操作。为了实现这种效果,传递给优化器的不再是 Variables 迭代器,而是 dict 迭代器。字典中的每一个元素都定义一个单独的参数组,并且包含一个 ‘params’ 键值,‘params’对应的值是一个参数列表。其它的keys应该匹配优化器能接收的关键字参数,并且被用作这组参数的优化选项。 

注意

You can still pass options as keyword arguments. They will be used as defaults, in the groups that didn’t override them. This is useful when you only want to vary a single option, while keeping all others consistent between parameter groups.

例如,当希望指定单层网络的学习率的时候,这是非常有用的。

optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

上面这段代码的意思是,model.base的参数使用默认的学习率1e-2,model.classifier的参数将使用学习率1e-3,动量0.9将用于所有的参数。

Taking an optimization step

所有的优化器都实现了一个 step() 函数,这个函数更新参数,它有两种使用方式:

1. optimizer.step()

这是一个简单版本,大多数优化器都支持。一旦使用 .backward() 函数计算了梯度,这个函数就可以被调用。

Example:

for input, target in dataset:
    optimizer.zero_grad()
    output = model(input)
    loss = loss_fn(output, target)
    loss.backward()
    optimizer.step()

2. optimizer.step(closure)

 一些优化算法,像 Conjugate Gradient 和 LBFGS ,需要重新评估函数多次,所以你必须使用一个闭包函数,这个闭包函数让你可以重新计算你的模型。这个闭包函数应该清除梯度,计算并返回损失。

Example:

for input, target in dataset:
    def closure():
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        return loss
    optimizer.step(closure)

Algorithms

1. CLASS torch.optim.Optimizer(params, defaults)

所有优化器的基类

参考链接:https://pytorch.org/docs/stable/optim.html?highlight=torch%20optim%20optimizer#torch.optim.Optimizer

2. CLASS torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)

3. CLASS torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0, eps=1e-10)

4. CLASS torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

5. CLASS torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)

6. CLASS torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)

7. CLASS torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

8. CLASS torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)

9. CLASS torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None)

10. CLASS torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)

11. CLASS torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))

12. CLASS torch.optim.SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)

随机梯度下降法。

Nesterov(人名) momentum 是基于一个公式,这个公式来源于论文

参考链接:https://pytorch.org/docs/stable/optim.html?highlight=torch%20optim%20sgd#torch.optim.SGD

Example:

>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

How to adjust learning rate

torch.optim.lr_scheduler 提供了一个方法来基于epochs的数目调整学习率。

torch.optim.lr_scheduler.ReduceLROnPlateau 允许基于验证集测量结果的动态学习率降低。

学习率调度应该在优化器更新参数之后使用,你应该按照下面的方法完成你的代码:

>>> scheduler = ...
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

警告:

Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called before the optimizer’s update; 1.1.0 changed this behavior in a BC-breaking way. If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling scheduler.step() at the wrong time.

基于epochs的数目调整学习率的方法:

1. CLASS torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)

功能解读:将每个参数组的学习率设置成最初的学习率与一个给定函数的乘积。When last_epoch=-1, sets initial lr as lr.

官网链接

>>> # Assuming optimizer has two groups.
>>> lambda1 = lambda epoch: epoch // 30
>>> lambda2 = lambda epoch: 0.95 ** epoch
>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

2. CLASS torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)

3. CLASS torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)

4. CLASS torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)

5. CLASS torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)

6. CLASS torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)

 

基于验证集测量结果的调整学习率方法:

1. CLASS torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08, verbose=False)

官网链接

功能解读:当一个评价指标停止改善的时候,降低学习率。一旦学习停滞,降低学习率,常常可以使模型受益。

Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

Example:

>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = ReduceLROnPlateau(optimizer, 'min')
>>> for epoch in range(10):
>>>     train(...)
>>>     val_loss = validate(...)
>>>     # Note that step should be called after validate()
>>>     scheduler.step(val_loss)

2. CLASS torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1, verbose=False)

3. CLASS torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1, verbose=False)

4. CLASS torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=-1, verbose=False)

 

Stochastic Weight Averaging

Constructing averaged models

SWA learning rate schedules

Taking care of batch normalization

Custom averaging strategies

Putting it all together

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值