POINTER运行记录

最新推荐文章于 2024-06-11 19:22:51 发布

Stella_ting

最新推荐文章于 2024-06-11 19:22:51 发布

阅读量136

点赞数

文章标签：自然语言处理深度学习

本文链接：https://blog.csdn.net/weixin_42137840/article/details/120533791

版权

最近在运行EMNLP 2020 paper: "POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training"的代碼：GitHub - dreasysnail/POINTERContribute to dreasysnail/POINTER development by creating an account on GitHub.https://github.com/dreasysnail/POINTER

遇到的问题记录如下(持续更新）：

nltk.download('stopwords')失敗：

从Github上下载stopwords.zip,并解压放到目录下。
Github地址为 https://github.com/nltk/nltk_data/tree/gh-pages/packages/corpora
至于放到哪个目录，在执行nltk.downloads(‘stopwords’)最后会给你这样的提示：（由於是之後記錄的，所以使用了他人的圖片）

在这里插入图片描述

我最後是放在/opt/conda/envs/SpareNet/nltk_data/corpora/stopwords.zip下，注意不存在的文件夾需要自己創建。

ubuntu繁体字转换简体字：

ctrl+shift+c+f

BertTokenizer使用详解：

一文学会Pytorch版本BERT使用_ccbrid的博客-CSDN博客前言：coder们最常用的Pytorch版本的BERT应该就是这一份了吧https://github.com/huggingface/pytorch-pretrained-BERT这份是刚出BERT的时候出的，暂且叫它旧版我在学习使用旧版的时候粗略的记过一些笔记：https://blog.csdn.net/ccbrid/article/details/88732857随着BER...https://blog.csdn.net/ccbrid/article/details/104355299/

在BertTokenizer中加入自己的词汇：

在training.py 第247行中加入以下代码：

    with open('iu_vocab.json','r') as f:
        tokens_list = json.load(f)['vocab']
  

    tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)

    tokenizer.add_tokens(tokens_list)

然后在载入模型时，training.py第282行加入以下代码：

    else:
        model = BertForMaskedLM.from_pretrained(args.bert_model)
        model.resize_token_embeddings(args.len_tokens)

遇到问题：decoder的vocab_size没有resize:

在modeling_utils.py第299行加入以下代码：

    def _tie_or_clone_weights(self, first_module, second_module):
        """ Tie or clone module weights depending of weither we are using TorchScript or not
        """
        # Update bias size if has attribuate bias
        if hasattr(self, "cls"):
            self.cls.predictions.bias.data = torch.nn.functional.pad(
                self.cls.predictions.bias.data,
                (0, self.config.vocab_size - self.cls.predictions.bias.shape[0]),
                "constant",
                0,
            )
        if self.config.torchscript:
            first_module.weight = nn.Parameter(second_module.weight.clone())
        else:
            first_module.weight = second_module.weight

参考了github上大佬的issues:https://github.com/huggingface/transformers/issues/2480https://github.com/huggingface/transformers/issues/2480

Stella_ting

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
POINTER运行记录

最近在运行EMNLP 2020 paper: "POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training"的代碼：GitHub - dreasysnail/POINTERContribute to dreasysnail/POINTER development by creating an account on GitHub.https://github.com/dreasysna
复制链接

扫一扫

POINTER运行记录

“相关推荐”对你有帮助么？