(PROTOTYPE) FX GRAPH MODE POST TRAINING DYNAMIC QUANTIZATION
Tutorials > (prototype) FX Graph Mode Post Training Dynamic Quantization
Author: Jerry Zhang
2022年5月24日
tag : 翻译学习
topic : Pytorch 量化
(prototype) FX Graph Mode Post Training Dynamic Quantization
- 本教程介绍基于
torch.fx
在graph mode下进行训练后动态量化的步骤。 - 单独的FX Graph Mode Post Training Static Quantization教程。
- FX图形模式量化和急切模式量化之间的比较可以在quantization docs中找到。
tldr; The FX Graph Mode API for dynamic quantization looks like the following:
import torch
from torch.quantization import default_dynamic_qconfig
# Note that this is temporary, we'll expose these functions to torch.quantization after official releasee
from torch.quantization.quantize_fx import prepare_fx, convert_fx
float_model.eval()
qconfig = get_default_qconfig("fbgemm")
qconfig_dict = {"": qconfig}
prepared_model = prepare_fx(float_model, qconfig_dict) # fuse modules and insert observers
# no calibration is required for dynamic quantization
quantized_model = convert_fx(prepared_model) # convert the model to a dynamically quantized model
在本教程中,我们将动态量化应用于基于 LSTM 的下一个单词预测模型,紧随 PyTorch 示例中的单词语言模型。我们将在Dynamic Quantization on an LSTM Word Language Model中的代码,并省略描述。
-
- 定义模型,下载数据和模型
下载 data 并解压缩到数据文件夹
mkdir data cd data wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip unzip wikitext-2-v1.zip
将 模型 下载到数据文件夹:
wget https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth
定义模型:
# imports import os from io import open import time import copy import torch import torch.nn as nn import torch.nn.functional as F # Model Definition class LSTMModel(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5): super(LSTMModel, self).__init__() self.drop = nn.Dropout(dropout) self.encoder = nn.Embedding(ntoken, ninp) self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout) self.decoder = nn.Linear(nhid, ntoken) self.init_weights() self.nhid = nhid self.nlayers = nlayers def init_weights(self): initrange = 0.1 self.encoder.weight.data.uniform_(-initrange, initrange) self.decoder.bias.data.zero_() self.decoder.weight.data.uniform_(-initrange, initrange) def forward(self, input, hidden): emb = self.drop(self.encoder(input)) output, hidden = self.rnn(emb, hidden) output = self.drop(output) decoded = self.decoder(output) return decoded, hidden def init_hidden(lstm_model, bsz): # get the weight tensor and create hidden layer in the same device weight = lstm_model.encoder.weight # get weight from quantized model if not isinstance(weight, torch.Tensor): weight = weight() device = weight.device nlayers = lstm_model.rnn.num_layers nhid = lstm_model.rnn.hidden_size return (torch.zeros(nlayers, bsz, nhid, device=device), torch.zeros(nlayers, bsz, nhid, device=device)) # Load Text Data class Dictionary(object): def __init__(self): self.word2idx = {} self.idx2word = [] def add_word(self, word): if word not in self.word2idx: self.idx2word.append(word) self.word2idx[word] = len(self.idx2word) - 1 return self.word2idx[word] def __len__(self): return len(self.idx2word) class Corpus(object): def __init__(self, path): self.dictionary = Dictionary() self.train = self.tokenize(os.path.join(path, 'wiki.train.tokens')) self.valid = self.tokenize(os.path.join(path, 'wiki.valid.tokens')) self.test = self.tokenize(os.path.join(path, 'wiki.test.tokens')) def tokenize(self, path): """Tokenizes a text file.""" assert os.path.exists(path) # Add words to the dictionary with open(path, 'r', encoding="utf8") as f: for line in f: words = line.split() + ['<eos>'] for word in words: self.dictionary.add_word(word) # Tokenize file content with open(path, 'r', encoding="utf8") as f: idss = [] for line in f: words = line.split() + ['<eos>'] ids = [] for word in words: ids.append(self.dictionary.word2idx[word]) idss.append(torch.tensor(ids).type(torch.int64)) ids = torch.cat(idss) return ids model_data_filepath = 'data/' corpus = Corpus(model_data_filepath + 'wikitext-2') ntokens = len(corpus.dictionary) # Load Pretrained Model model = LSTMModel( ntoken = ntokens, ninp = 512, nhid = 256, nlayers = 5, ) model.load_state_dict( torch.load( model_data_filepath + 'word_language_model_quantize.pth', map_location=torch.device('cpu') ) ) model.eval() print(model) bptt = 25 criterion = nn.CrossEntropyLoss() eval_batch_size = 1 # create test data set def batchify(data, bsz): # Work out how cleanly we can divide the dataset into bsz parts. nbatch = data.size(0) // bsz # Trim off any extra elements that wouldn't cleanly fit (remainders). data = data.narrow(0, 0, nbatch * bsz) # Evenly divide the data across the bsz batches. return data.view(bsz, -1).t().contiguous() test_data = batchify(corpus.test, eval_batch_size) # Evaluation functions def get_batch(source, i): seq_len = min(bptt, len(source) - 1 - i) data = source[i:i+seq_len] target = source[i+1:i+1+seq_len].reshape(-1) return data, target def repackage_hidden(h): """Wraps hidden states in new Tensors, to detach them from their history.""" if isinstance(h, torch.Tensor): return h.detach() else: return tuple(repackage_hidden(v) for v in h) def evaluate(model_, data_source): # Turn on evaluation mode which disables dropout. model_.eval() total_loss = 0. hidden = init_hidden(model_, eval_batch_size) with torch.no_grad(): for i in range(0, data_source.size(0) - 1, bptt): data, targets = get_batch(data_source, i) output, hidden = model_(data, hidden) hidden = repackage_hidden(hidden) output_flat = output.view(-1, ntokens) total_loss += len(data) * criterion(output_flat, targets).item() return total_loss / (len(data_source) - 1)
2. Post Training Dynamic Quantization
动态量化模型可以使用与训练后静态量化相同的函数,但具有动态 qconfig。
from torch.quantization.quantize_fx import prepare_fx, convert_fx
from torch.quantization import default_dynamic_qconfig, float_qparams_weight_only_qconfig
# Full docs for supported qconfig for floating point modules/ops can be found in docs for quantization (TODO: link)
# Full docs for qconfig_dict can be found in the documents of prepare_fx (TODO: link)
qconfig_dict = {
"object_type": [
(nn.Embedding, float_qparams_weight_only_qconfig),
(nn.LSTM, default_dynamic_qconfig),
(nn.Linear, default_dynamic_qconfig)
]
}
# Deepcopying the original model because quantization api changes the model inplace and we want
# to keep the original model for future comparison
model_to_quantize = copy.deepcopy(model)
prepared_model = prepare_fx(model_to_quantize, qconfig_dict)
print("prepared model:", prepared_model)
quantized_model = convert_fx(prepared_model)
print("quantized model", quantized_model)
对于动态量化的objects,仅对模块插入observers,以获得动态可量化的函数和torch ops的权重。融合了Conv + Bn,Linear + ReLU等模块。prepare_fx
在转换中将浮点数模块转换为动态量化模块,并将浮点运算转换为动态量化ops。可以在示例模型中看到 ,是动态量化的。nn.Embedding``nn.Linear``nn.LSTM
现在我们可以比较量化模型的大小和运行时间。
def print_size_of_model(model):
torch.save(model.state_dict(), "temp.p")
print('Size (MB):', os.path.getsize("temp.p")/1e6)
os.remove('temp.p')
print_size_of_model(model)
print_size_of_model(quantized_model)
有 4 倍的尺寸减小,因为我们量化了模型中的所有权重 (nn.Embedding
, nn.Linear
和nn.LSTM
)从浮点数(4 个字节)到量化的 int(1 个字节)。
torch.set_num_threads(1)
def time_model_evaluation(model, test_data):
s = time.time()
loss = evaluate(model, test_data)
elapsed = time.time() - s
print('''loss: {0:.3f}\nelapsed time (seconds): {1:.1f}'''.format(loss, elapsed))
time_model_evaluation(model, test_data)
time_model_evaluation(quantized_model, test_data)
此模型的加速速度大约为 2 倍。另请注意,加速可能会因型号,设备,构建,输入批量大小,线程等而异。
3. Conclusion
.1f}‘’'.format(loss, elapsed))
time_model_evaluation(model, test_data)
time_model_evaluation(quantized_model, test_data)
此模型的加速速度大约为 2 倍。另请注意,加速可能会因型号,设备,构建,输入批量大小,线程等而异。
### [3. Conclusion](https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html#conclusion)
本教程介绍了用于 FX 图形模式下的训练后动态量化的 API,该 API 可动态量化与 Eager 模式量化相同的模块。