Paper
文章平均质量分 82
优秀论文笔记
还卿一钵无情泪
虚空界尽 众生界尽 众生业尽 众生烦恼尽 我此愿望无有穷尽 念念相续 无有间断 身语意业 无有疲厌
展开
-
Dice Loss for Data-imbalanced NLP Tasks (Dice Loss,替代cross entropy(CE)处理数据不平衡问题)
https://github.com/ShannonAI/dice_loss_for_NLP通过定义Dice Loss,替代cross entropy(CE)处理数据不平衡问题。问题在样本不平衡的情况中,主要有以下两种难以克服的难点 the training-test discrepancy: 如果不能很好的平衡样本标签,那么模型学习过程通常会朝着有更多标签的类别收敛。 the overwhelming effect of easy-negative examples: 如果容易判...原创 2022-05-01 06:15:00 · 789 阅读 · 1 评论 -
轻量化微调 Parameter-Efficient Fine-Tuning
导读近年来,大规模预训练模型在自然语言处理任务上取得了巨大的成功。对预先训练好的语言模型进行微调是目前自然语言处理任务中的普遍范式,在许多下游任务上表现出了极好的性能。全参数微调,即对模型的所有参数进行训练,是目前将预训练模型应用到下游任务的最通用方法。然而,全微调的一大弊病是对于每一个任务,模型均需要保留一份大规模的参数备份,在下游任务量很大时这种做法会相当昂贵。在预训练模型越来越大,不断逼近到千亿甚至万亿参数规模时,这种问题会被无限放大。轻量化微调(Parameter-Efficient F转载 2022-04-25 16:53:34 · 3467 阅读 · 0 评论 -
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
问题研究对预训练的语言模型进行微调后就可以在众多下游任务中获取 SOTA 的结果,但这个过程的机制并没有得到很好的理解。尤其是在低数据情况下,为什么可以使用 vanilla gradient descent 方法在只有数百或数千个标记样本的数据集上调整具有数亿个参数的模型概念本征维度 (intrinsic dimension)对一个高维优化问题,获取对应目标的满意解所需的最小参数维度具体公式请参加论文实验根据 GLUE 指标在 MRPC 和 QQP 两个微调数据.原创 2022-04-26 07:00:00 · 903 阅读 · 0 评论 -
SBERT-WK: A Sentence Embedding Method byDissecting BERT-based Word Models
https://arxiv.org/pdf/2002.06652.pdfI. INTRODUCTIONOne limitation of BERT is that due to the large model size, it is time consuming to perform sentence pair regression such as clustering and semantic search.One effective way to solve this problem i原创 2021-12-24 17:04:03 · 863 阅读 · 0 评论 -
A Survey of Transformers(整理总结)
https://arxiv.org/abs/2106.04554引言Transformer 最初是作为机器翻译的 Seq2Seq 模型提出的。后来的工作表明,基于 Transformer 的预训练模型 (PTM) 可以在各种任务上实现 SOTA。因此,Transformer,特别是 PTM,已成为 NLP 中的首选架构。除了语言相关的应用,Transformer 还被 CV,音频处理甚至其他学科采用。在过去几年中提出了各种 Transformer 变体(又名 X-former),这些 X-fo.原创 2021-12-22 09:59:49 · 1163 阅读 · 0 评论 -
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length wi原创 2021-12-20 11:34:47 · 442 阅读 · 0 评论 -
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet is a generalized AR pretraining method that uses a permutation language modeling objective to combine the advantages of AR and AE methods. The neural architecture of XLNet is developed to work seamlessly with the AR objective, including integrating T原创 2021-12-14 17:44:04 · 603 阅读 · 0 评论 -
How to Fine-Tune BERT for Text Classification?
investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.investigate the different approaches to fine-tuning BERT for the text classification task. There are some experimental fin原创 2021-12-13 18:37:31 · 1290 阅读 · 0 评论 -
Taming Pretrained Transformers for Extreme Multi-label Text Classification
In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. The proposed method achieves new state-of-the-art results on four XMC benchmark datasets. In particular, on a Wiki dataset with原创 2021-12-13 18:29:13 · 1680 阅读 · 0 评论 -
美团NLP以及知识图谱文章提炼
1.基本定位作为人工智能时代最重要的知识表示方式之一,知识图谱能够打破不同场景下的数据隔离,为搜索、推荐、问答、解释与决策等应用提供基础支撑。美团大脑围绕吃喝玩乐等多种场景,构建了生活娱乐领域超大规模的知识图谱,为用户和商家建立起全方位的链接。希望对应用场景下的用户偏好和商家定位进行更为深度的理解,进而为大众提供更好的智能化服务2.场景当用户发表一条评价的时候,能够让机器阅读这条评价,充分理解用户的喜怒哀乐。当用户进入大众点评的一个商家页面时,面对成千上万条用户评论,我们希望机器原创 2021-08-27 09:50:29 · 867 阅读 · 0 评论 -
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
原文 https://arxiv.org/abs/1908.10084AbstractSTS semantic textual similarityBERT结构不适合语义相似搜索,非监督的任务聚类等SBERT Sentence-BERTfinding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy.原创 2021-07-05 16:52:56 · 417 阅读 · 0 评论