https://download.pytorch.org/whl/cu100/torch_stable.html
Via pip
Download the whl
file with the desired version via this command (you can replace 1.0.1 with the version you choose):
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
https://download.pytorch.org/whl/cpu/torch_stable.html # CPU-only build
https://download.pytorch.org/whl/cu80/torch_stable.html # CUDA 8.0 build
https://download.pytorch.org/whl/cu90/torch_stable.html # CUDA 9.0 build
https://download.pytorch.org/whl/cu92/torch_stable.html # CUDA 9.2 build
https://download.pytorch.org/whl/cu100/torch_stable.html # CUDA 10.0 build
pip install torch==1.0.1 -f https://download.pytorch.org/whl/cu100/torch_stable.html
Note: most pytorch versions are available only for specific CUDA versions. For example pytorch=1.0.1 is not available for CUDA 9.2
在终端输入你想要的版本,他会主动显示下载地址并且下载,但是我们不下载,只复制地址,然后用迅雷下载
py2.7-cuda9.0-torch1.1.0
https://files.pythonhosted.org/packages/0f/ff/92aea60792d3b45c44ded21d6248690f69a6153af9685aad1424507ffe84/torch-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
py2.7-cuda9.0-torch0.4.1
http://61.155.190.114/torch-0.4.1-cp27-cp27mu-manylinux1_x86_64.whl?fid=bsMLDf3UtbOMoljzVM9sgaTS2blfIPceAAAAAOkilp7Hgo81WbO9p2GvEPNStQEH&mid=666&threshold=150&tid=15051D55DC4A804114D209EA1C4133BC&srcid=119&verno=1
torchvision-0.3.0-cp27
https://files.pythonhosted.org/packages/91/ec/3a5bd85c2655f4285b4ffb600fc05a2f6e8b317bcbda00b45688d790b914/torchvision-0.3.0-cp27-cp27mu-manylinux1_x86_64.whl
py2.7- numpy-1.16.4
https://files.pythonhosted.org/packages/1f/c7/198496417c9c2f6226616cff7dedf2115a4f4d0276613bab842ec8ac1e23/numpy-1.16.4-cp27-cp27mu-manylinux1_x86_64.whl
#Please make sure that
# - PATH includes /usr/local/cuda-8.0/bin
# - LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /#etc/ld.so.conf and run ldconfig as root
export PATH=/usr/local/cuda/bin:$PATH
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PYTHONPATH=/home/boyun/software/caffe/caffe-ssd_/python:$PYTHONPATH
#export PATH="/home/boyun/anaconda3/bin:$PATH" # commented out by conda initialize
https://github.com/lanpa/tensorboardX
本教程代码环境依赖:
python 3.6+
Pytorch 0.4.0+
tensorboardX: pip install tensorboardX、pip install tensorflow
# demo.py
import torch
import torchvision.utils as vutils
import numpy as np
import torchvision.models as models
from torchvision import datasets
from tensorboardX import SummaryWriter
resnet18 = models.resnet18(False)
writer = SummaryWriter()
sample_rate = 44100
freqs = [262, 294, 330, 349, 392, 440, 440, 440, 440, 440, 440]
for n_iter in range(100):
dummy_s1 = torch.rand(1)
dummy_s2 = torch.rand(1)
# data grouping by `slash`
writer.add_scalar('data/scalar1', dummy_s1[0], n_iter)
writer.add_scalar('data/scalar2', dummy_s2[0], n_iter)
writer.add_scalars('data/scalar_group', {'xsinx': n_iter * np.sin(n_iter),
'xcosx': n_iter * np.cos(n_iter),
'arctanx': np.arctan(n_iter)}, n_iter)
dummy_img = torch.rand(32, 3, 64, 64) # output from network
if n_iter % 10 == 0:
x = vutils.make_grid(dummy_img, normalize=True, scale_each=True)
writer.add_image('Image', x, n_iter)
dummy_audio = torch.zeros(sample_rate * 2)
for i in range(x.size(0)):
# amplitude of sound should in [-1, 1]
dummy_audio[i] = np.cos(freqs[n_iter // 10] * np.pi * float(i) / float(sample_rate))
writer.add_audio('myAudio', dummy_audio, n_iter, sample_rate=sample_rate)
writer.add_text('Text', 'text logged at step:' + str(n_iter), n_iter)
for name, param in resnet18.named_parameters():
writer.add_histogram(name, param.clone().cpu().data.numpy(), n_iter)
# needs tensorboard 0.4RC or later
writer.add_pr_curve('xoxo', np.random.randint(2, size=100), np.random.rand(100), n_iter)
dataset = datasets.MNIST('mnist', train=False, download=True)
images = dataset.test_data[:100].float()
label = dataset.test_labels[:100]
features = images.view(100, 784)
writer.add_embedding(features, metadata=label, label_img=images.unsqueeze(1))
# export scalar data to JSON for external processing
writer.export_scalars_to_json("./all_scalars.json")
writer.close()
也可以简单一点:
from tensorboardX import SummaryWriter
writer = SummaryWriter('log')
#new ynh
#每10个batch画个点用于loss曲线
if batch_idx % 10 == 0:
niter = epoch * len(train_loader) + batch_idx
writer.add_scalar('Train/Loss', loss.data, niter)
# new ynh
writer.add_scalar('Test/Accu', test_loss, epoch)
会发现刚刚的log文件夹里面有文件了。在命令行输入如下,载入刚刚做图的文件(那个./log要写完整的路径)
tensorboard --logdir=./log
在浏览器输入:
http://0.0.0.0:6006/
就可以看到我们做的两个图了
文档:
中文文档:
https://tensorboard-pytorch.readthedocs.io/en/latest/tutorial_zh.html
https://github.com/lanpa/tensorboardX/blob/master/tensorboardX/writer.py
https://github.com/sksq96/pytorch-summary
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # PyTorch v0.4.0
model = Net().to(device)
summary(model, (1, 28, 28))
>>>>>:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 10, 24, 24] 260
Conv2d-2 [-1, 20, 8, 8] 5,020
Dropout2d-3 [-1, 20, 8, 8] 0
Linear-4 [-1, 50] 16,050
Linear-5 [-1, 10] 510
================================================================
Total params: 21,840
Trainable params: 21,840
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.06
Params size (MB): 0.08
Estimated Total Size (MB): 0.15
----------------------------------------------------------------
import torch
from torchvision import models
from torchsummary import summary
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vgg = models.vgg16().to(device)
summary(vgg, (3, 224, 224))
>>>>>:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590,080
ReLU-16 [-1, 256, 56, 56] 0
MaxPool2d-17 [-1, 256, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 1,180,160
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2,359,808
ReLU-21 [-1, 512, 28, 28] 0
Conv2d-22 [-1, 512, 28, 28] 2,359,808
ReLU-23 [-1, 512, 28, 28] 0
MaxPool2d-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
ReLU-28 [-1, 512, 14, 14] 0
Conv2d-29 [-1, 512, 14, 14] 2,359,808
ReLU-30 [-1, 512, 14, 14] 0
MaxPool2d-31 [-1, 512, 7, 7] 0
Linear-32 [-1, 4096] 102,764,544
ReLU-33 [-1, 4096] 0
Dropout-34 [-1, 4096] 0
Linear-35 [-1, 4096] 16,781,312
ReLU-36 [-1, 4096] 0
Dropout-37 [-1, 4096] 0
Linear-38 [-1, 1000] 4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 527.79
Estimated Total Size (MB): 746.96
----------------------------------------------------------------
https://github.com/Swall0w/torchstat
$ torchstat masato$ torchstat -f example.py -m Net
[MAdd]: Dropout2d is not supported!
[Flops]: Dropout2d is not supported!
[Memory]: Dropout2d is not supported!
module name input shape output shape params memory(MB) MAdd Flops MemRead(B) MemWrite(B) duration[%] MemR+W(B)
0 conv1 3 224 224 10 220 220 760.0 1.85 72,600,000.0 36,784,000.0 605152.0 1936000.0 57.49% 2541152.0
1 conv2 10 110 110 20 106 106 5020.0 0.86 112,360,000.0 56,404,720.0 504080.0 898880.0 26.62% 1402960.0
2 conv2_drop 20 106 106 20 106 106 0.0 0.86 0.0 0.0 0.0 0.0 4.09% 0.0
3 fc1 56180 50 2809050.0 0.00 5,617,950.0 2,809,000.0 11460920.0 200.0 11.58% 11461120.0
4 fc2 50 10 510.0 0.00 990.0 500.0 2240.0 40.0 0.22% 2280.0
total 2815340.0 3.56 190,578,940.0 95,998,220.0 2240.0 40.0 100.00% 15407512.0
===============================================================================================================================================
Total params: 2,815,340
-----------------------------------------------------------------------------------------------------------------------------------------------
Total memory: 3.56MB
Total MAdd: 190.58MMAdd
Total Flops: 96.0MFlops
Total MemR+W: 14.69MB
from torchstat import stat
import torchvision.models as models
model = models.resnet18()
stat(model, (3, 224, 224))
https://github.com/sovrasov/flops-counter.pytorch
Flops counter for convolutional networks in pytorch framework
Pypi version
This script is designed to compute the theoretical amount of multiply-add operations in convolutional neural networks. It also can compute the number of parameters and print per-layer computational cost of a given network.
Supported layers:
Conv1d/2d/3d (including grouping)
ConvTranspose2d (including grouping)
BatchNorm1d/2d/3d
Activations (ReLU, PReLU, ELU, ReLU6, LeakyReLU)
Linear
Upsample
Poolings (AvgPool1d/2d/3d, MaxPool1d/2d/3d and adaptive ones)
Requirements: Pytorch >= 0.4.1, torchvision >= 0.2.1
Thanks to @warmspringwinds for the initial version of script.
Usage tips
This script doesn't take into account torch.nn.functional.* operations. For an instance, if one have a semantic segmentation model and use torch.nn.functional.interpolate to upscale features, these operations won't contribute to overall amount of flops. To avoid that one can use torch.nn.Upsample instead of torch.nn.functional.interpolate.
ptflops launches a given model on a random tensor and estimates amount of computations during inference. Complicated models can have several inputs, some of them could be optional. To construct non-trivial input one can use the input_constructor argument of the get_model_complexity_info. input_constructor is a function that takes the input spatial resolution as a tuple and returns a dict with named input arguments of the model. Next this dict would be passed to the model as keyworded arguments.
Install the latest version
pip install --upgrade git+https://github.com/sovrasov/flops-counter.pytorch.git
Example
import torchvision.models as models
import torch
from ptflops import get_model_complexity_info
with torch.cuda.device(0):
net = models.densenet161()
flops, params = get_model_complexity_info(net, (3, 224, 224), as_strings=True, print_per_layer_stat=True)
print('Flops: ' + flops)
print('Params: ' + params)
Benchmark
torchvision
Model Input Resolution Params(M) MACs(G) Top-1 error Top-5 error
alexnet 224x224 61.1 0.72 43.45 20.91
vgg11 224x224 132.86 7.63 30.98 11.37
vgg13 224x224 133.05 11.34 30.07 10.75
vgg16 224x224 138.36 15.5 28.41 9.62
vgg19 224x224 143.67 19.67 27.62 9.12
vgg11_bn 224x224 132.87 7.64 29.62 10.19
vgg13_bn 224x224 133.05 11.36 28.45 9.63
vgg16_bn 224x224 138.37 15.53 26.63 8.50
vgg19_bn 224x224 143.68 19.7 25.76 8.15
resnet18 224x224 11.69 1.82 30.24 10.92
resnet34 224x224 21.8 3.68 26.70 8.58
resnet50 224x224 25.56 4.12 23.85 7.13
resnet101 224x224 44.55 7.85 22.63 6.44
resnet152 224x224 60.19 11.58 21.69 5.94
squeezenet1_0 224x224 1.25 0.83 41.90 19.58
squeezenet1_1 224x224 1.24 0.36 41.81 19.38
densenet121 224x224 7.98 2.88 25.35 7.83
densenet169 224x224 14.15 3.42 24.00 7.00
densenet201 224x224 20.01 4.37 22.80 6.43
densenet161 224x224 28.68 7.82 22.35 6.20
inception_v3 224x224 27.16 2.85 22.55 6.44
Top-1 error - ImageNet single-crop top-1 error (224x224)
Top-5 error - ImageNet single-crop top-5 error (224x224)
Pytorch学习第四讲:加载预训练模型
1. 直接加载预训练模型
在训练的时候可能需要中断一下,然后继续训练,也就是简单的从保存的模型中加载参数权重:
net = SNet()
net.load_state_dict(torch.load("model_1599.pkl"))
这种方式是针对于之前保存模型时以保存参数的格式使用的:
torch.save(net.state_dict(), "model/model_1599.pkl")
pytorch官网更推荐上述模型保存方法,也据说这种方式比下一种更快一点。
下面介绍第二种模型保存和加载的方式:
net = SNet()
torch.save(net, "model_1599.pkl")
snet = torch.load("model_1599.pkl")
这种方式会将整个网络保存下来,数据量会更大,会消耗更多的时间,占用内存也更高。
2. 加载一部分预训练模型
模型可能是一些经典的模型改掉一部分,比如一般算法中提取特征的网络常见的会直接使用vgg16的features extraction部分,也就是在训练的时候可以直接加载已经在imagenet上训练好的预训练参数,这种方式实现如下:
net = SNet()
model_dict = net.state_dict()
vgg16 = models.vgg16(pretrained=True)
pretrained_dict = vgg16.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
net.load_state_dict(model_dict)
也就是在网络中state_dict部分,属
于vgg16的,替换成vgg16预训练模型里的参数(代码里的k:v for k,v in pretrained_dict.items() if k in model_dict),其他保持不变。
3. 微调经典网络
因为pytorch中的torchvision给出了很多经典常用模型,并附加了预训练模型。利用好这些训练好的基础网络可以加快不少自己的训练速度。
首先比如加载vgg16(带有预训练参数的形式):
import torchvision.models as models
vgg16 = models.vgg16(pretrained=True)
比如,网络第一层本来是Conv2d(3, 64, 3, 1, 1),想修改成Conv2d(4, 64, 3, 1 ,1),那直接赋值就可以了:
import torch.nn as nn
vgg16.features[0]=nn.Conv2d(4, 64, 3, 1, 1)
4. 修改经典网络
这个比上面微调修改的地方要多一些,但是想介绍一下这样的修改方式。
先简单介绍一下我需要需改的部分,在vgg16的基础模型下,每一个卷积都要加一个dropout层,并将ReLU激活函数换成PReLU,最后两层的Pooling层stride改成1。直接上代码:
def feature_layer():
layers = []
pool1 = ['4', '9', '16']
pool2 = ['23', '30']
vgg16 = models.vgg16(pretrained=True).features
for name, layer in vgg16._modules.items():
if isinstance(layer, nn.Conv2d):
layers += [layer, nn.Dropout2d(0.5), nn.PReLU()]
elif name in pool1:
layers += [layer]
elif name == pool2[0]:
layers += [nn.MaxPool2d(2, 1, 1)]
elif name == pool2[1]:
layers += [nn.MaxPool2d(2, 1, 0)]
else:
continue
features = nn.Sequential(*layers)
#feat3 = features[0:24]
return features
大概的思路就是,创建一个新的网络(layers列表), 遍历vgg16里每一层,如果遇到卷积层(if isinstance(layer, nn.Conv2d)就先把该层(Conv2d)保持原样加进去,随后增加一个dropout层,再加一个PReLU层。然后如果遇到最后两层pool,就修改响应参数加进去,其他的pool正常加载。 最后将这个layers列表转成网络的nn.Sequential的形式,最后返回features。然后再你的新的网络层就可以用以下方式来加载:
class SNet(nn.Module):
def __init__(self):
super(SNet, self).__init__()
self.features = feature_layer()
def forward(self, x):
x = self.features(x)
return x