人像照片转换三维建模_实现语言建模的转换器

最新推荐文章于 2024-09-04 10:31:20 发布

weixin_26729763

最新推荐文章于 2024-09-04 10:31:20 发布

阅读量717

点赞数

原文链接：https://towardsdatascience.com/implementing-transformer-for-language-modeling-ba5dd60389a2

版权

人像照片转换三维建模

介绍(Intro)

Recent trends in Natural Language Processing have been building upon one of the biggest breakthroughs in the history of the field: the Transformer. The Transformer is a model architecture researched mainly by Google Brain and Google Research. It was initially shown to achieve state-of-the-art in the translation task but was later shown to be effective in just about any NLP task when it became massively adopted. The transformer architecture consists of a stack of encoders and decoders with self-attention layers that help the model pay attention to respective inputs. You can learn more about transformers in the original paper here.

自然语言处理的最新趋势是建立在该领域历史上最大的突破之一上的： Transformer 。 Transformer是主要由Google Brain和Google Research研究的模型架构。最初显示它在翻译任务中达到了最先进的水平，但后来在其被广泛采用时，在几乎所有NLP任务中都显示出了有效的作用。转换器架构由带有自注意层的编码器和解码器堆栈组成，这些层可帮助模型注意相应的输入。你可以学习在原纸上更多变压器在这里。

In this post, we will be showing you how to implement the transformer for the language modeling task. Language modeling is the task of assigning probability to sentences in a language. The goal for language modeling is for the model to assign high probability to real sentences in our dataset so that it will be able to generate fluent sentences that are close to human-level through a decoder scheme. We will be using the Fairseq library for implementing the transformer.

在这篇文章中，我们将向您展示如何为语言建模任务实现转换器。 语言建模是为语言中的句子分配概率的任务。语言建模的目标是使模型为我们的数据集中的真实句子分配高概率，从而使其能够通过解码器方案生成接近人类水平的流利句子。我们将使用Fairseq库来实现转换器。

第1步：准备数据集(来自我之前的博客文章) (Step 1: Prepare Dataset (from my previous blog post))

In this article, we will be again using the CMU Book Summary Dataset to train the Transformer model. You can refer to Step 1 of the blog post to acquire and prepare the dataset. After preparing the dataset, you should have the train.txt, valid.txt, and test.txt files ready that correspond to the three partitions of the dataset.

在本文中，我们将再次使用CMU书籍摘要数据集来训练Transformer模型。您可以参考博客文章的第1步来获取和准备数据集。在准备好数据集之后，您应该准备好与数据集的三个分区相对应的train.txt ， valid.txt和test.txt文件。

步骤2：下载并安装Fairseq (Step 2: Download and Install Fairseq)

If you haven’t heard of Fairseq, it is a popular NLP library developed by Facebook AI for implementing custom models for translation, summarization, language modeling, and other generation tasks. You can check out my comments on Fairseq here.

如果您还没有听说过Fairseq ，那是Facebook AI开发的一种流行的NLP库，用于实现用于翻译，摘要，语言建模和其他生成任务的自定义模型。您可以在此处查看我对Fairseq的评论。

Now, in order to download and install Fairseq, run the following commands:

现在，为了下载并安装Fairseq，请运行以下命令：

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

You can also choose to install NVIDIA’s apex library to enable faster training if your GPU allows:

如果您的GPU允许，您还可以选择安装NVIDIA的apex库以实现更快的培训：

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Now, you have successfully installed Fairseq and finally we are all good to go!

现在，您已经成功安装了Fairseq，最后一切顺利！

步骤3：预处理数据集 (Step 3: Preprocess the Dataset)

To preprocess the dataset, we can use the fairseq command-line tool, which makes it easy for developers and researchers to directly run operations from the terminal. To preprocess our data, we can use fairseq-preprocess to build our vocabulary and also binarize the training data.

要预处理数据集，我们可以使用fairseq命令行工具，它使开发人员和研究人员可以轻松地从终端直接运行操作。要预处理我们的数据，我们可以使用fairseq-preprocess来构建词汇表，也可以对训练数据进行二值化。

cd fairseq/
DATASET=/path/to/dataset
fairseq-preprocess \
--only-source \
--trainpref $DATASET/train.txt \
--validpref $DATASET/valid.txt \
--testpref $DATASET/test.txt \
--destdir data-bin/summary \
--workers 20

Image for post — Command Ouput for Preprocessing

After executing the above commands, the preprocessed data will be saved in the directory specified by the --destdir .

执行上述命令后，预处理后的数据将保存在--destdir指定的目录中。

步骤4：训练变压器 (Step 4: Train the Transformer)

Finally, we can start training the transformer! To train a model, we can use the fairseq-train command:

最后，我们可以开始训练变压器了！要训练模型，我们可以使用fairseq-train命令：

CUDA_VISIBLE_DEVICES=0 fairseq-train --task language_modeling \
data-bin/summary \
--save-dir checkpoints/transformer_summary \
--arch transformer_lm --share-decoder-input-output-embed \
--dropout 0.1 \
--optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0 \
--lr 0.0005 --lr-scheduler inverse_sqrt --warmup-updates 4000 --warmup-init-lr 1e-07 \
--tokens-per-sample 512 --sample-break-mode none \
--max-tokens 2048 --update-freq 16 \
--fp16 \
--max-update 50000 \
--max-epoch 12

In our case, we specify the GPU to use as the 0th (CUDA_VISIBLE_DEVICES), task as language modeling (--task), the data in data-bin/summary , the architecture as a transformer language model (--arch ), the number of epochs to train as 12 (--max-epoch ) , and other hyperparameters. After training, the best checkpoint of the model will be saved in the directory specified by --save-dir .

在我们的案例中，我们指定GPU用作第0个( CUDA_VISIBLE_DEVICES )，将task用作语言建模( --task )，将data-bin/summary中的data-bin/summary用作架构，将其用作转换器语言模型( --arch )，要训练的时期数为12(-- --max-epoch )以及其他超参数。训练后，模型的最佳检查点将保存在--save-dir指定--save-dir 。

12 epochs will take a while, so sit back while your model trains! Of course, you can also reduce the number of epochs to train according to your needs. The following output is shown when the training is complete:

需要12个纪元的时间，请在训练模型时坐下来！当然，您也可以根据需要减少训练的时间。培训完成后，将显示以下输出：

Note that in each epoch, the relevant numbers are shown, such as loss and perplexity. These could be helpful for evaluating the model during the training process.

请注意，在每个纪元中，都会显示相关数字，例如损失和困惑。这些可能有助于在训练过程中评估模型。

步骤5：评估语言模型 (Step 5: Evaluate the Language Model)

After your model finishes training, you can evaluate the resulting language model using fairseq-eval-lm :

模型训练完成后，您可以使用fairseq-eval-lm评估生成的语言模型：

fairseq-eval-lm data-bin/summary \
--path checkpoints/transformer_summary/checkpoint_best.pt \
--max-sentences 2 \
--tokens-per-sample 512 \
--context-window 400

Here the test data will be evaluated to score the language model (the train and validation data are used in the training phase to find the optimized hyperparameters for the model). The following shows the command output after evaluation:

在这里，将对测试数据进行评估以对语言模型进行评分(训练阶段使用了训练数据和验证数据来查找模型的优化超参数)。下面显示评估后的命令输出：

As you can see, the loss of our model is 9.8415 and perplexity is 917.48 (in base 2).

如您所见，我们模型的损失是9.8415，困惑度是917.48(以2为底)。

步骤6：最后！让我们生成一些文本：D (Step 6: Finally! Let’s Generate Some Text :D)

After training the model, we can try to generate some samples using our language model. To generate, we can use the fairseq-interactive command to create an interactive session for generation:

训练模型后，我们可以尝试使用我们的语言模型生成一些样本。要生成，我们可以使用fairseq-interactive命令创建一个交互式会话以进行生成：

fairseq-interactive data-bin/summary \
--task language_modeling \
--path checkpoints/transformer_summary/checkpoint_best.pt \
--beam 5

During the interactive session, the program will prompt you an input text to enter. After the input text is entered, the model will generate tokens after the input. A generation sample given The book takes place as input is this:

在交互式会话期间，程序将提示您输入输入文本。输入输入文本后，模型将在输入后生成令牌。给定世代样本书作为输入这是：

The book takes place in the story of the story of the story of the story of the story of the story of the story of the story of the story of the story of the characters…

这本书发生在故事的故事的故事的故事的故事的故事的故事的故事的人物的故事的故事…

The generation is repetitive which means the model needs to be trained with better parameters. The above command uses beam search with beam size of 5. We can also use sampling techniques like top-k sampling:

生成是重复的，这意味着需要使用更好的参数来训练模型。上面的命令使用波束大小为5的波束搜索。我们还可以使用诸如top-k采样之类的采样技术：

fairseq-interactive data-bin/summary \
--task language_modeling \
--path checkpoints/transformer_summary/checkpoint_best.pt \
--sampling --beam 1 --sampling-topk 10

and top-p sampling:

和top-p采样：

fairseq-interactive data-bin/summary \
--task language_modeling \
--path checkpoints/transformer_summary/checkpoint_best.pt \
--sampling --beam 1 --sampling-topp 0.8

Note that when using top-k or top-sampling, we have to add the beam=1 to suppress the error that arises when --beam does not equal to--nbest . This seems to be a bug.

请注意，在使用top-k或top-sampling时，我们必须添加beam=1以抑制当--beam不等于--nbest时出现的误差。这似乎是一个错误。

结论 (Conclusion)

In this blog post, we have trained a classic transformer model on book summaries using the popular Fairseq library! Although the generation sample is repetitive, this article serves as a guide to walk you through running a transformer on language modeling. Take a look at my other posts if interested :D

在此博客文章中，我们使用流行的Fairseq库在书摘中训练了经典的变压器模型！尽管生成示例是重复的，但是本文还是作为指导，指导您逐步进行语言建模的转换器。看看我其他的帖子，如果有兴趣：D

翻译自: https://towardsdatascience.com/implementing-transformer-for-language-modeling-ba5dd60389a2

人像照片转换三维建模

weixin_26729763

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
人像照片转换三维建模_实现语言建模的转换器

人像照片转换三维建模介绍(Intro)Recent trends in Natural Language Processing have been building upon one of the biggest breakthroughs in the history of the field: the Transformer. The Transformer is a model archi...
复制链接

扫一扫