XMU_MIAO-CSDN博客

原创 [论文阅读]Learning Light-Weight Translation Models from Deep Transformer

文章目录前言摘要1、Introduction & Motivation2、How to do ?3、Experiments Analysis（main）总结前言论文名：Learning Light-Weight Translation Models from Deep Transformer论文作者：Bei Li et al.机构： NLP Lab,School of Computer Science and Engineering, Northeastern University, S

2021-01-27 12:55:51 495

原创 [论文阅读]TinyBERT: Distilling BERT for Natural Language Understanding

文章目录前言摘要1、Introduction & Motivation2、How to do ?2.1 Transformer Distillation2.2 TinyBERT Learning3、Experiments Analysis（main）总结前言论文名：TinyBERT: Distilling BERT for Natural Language Understanding论文作者：Xiaoqi Jiao et al.机构： 1）Key Laboratory of Inform

2021-01-25 21:24:13 715

原创 [论文阅读]An Efficient Transformer Decoder with Compressed Sub-layers

文章目录前言摘要1、Introduction & Motivation1.1 Introduction1.2 Motivation2、How to do ?3、Experiments Analysis（main）总结前言论文名：An Efficient Transformer Decoder with Compressed Sub-layers论文作者：Zihang Dai et al.机构： NLP Lab,School of Computer Science and Engineer

2021-01-23 22:19:13 660

原创 [论文阅读]Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

文章目录前言摘要1、Introduction & Motivation2、How to do ?2.1 Segment-Level Recurrence Mechanism2.2 Relative Positional Encoding3、Experiments Analysis（main）总结前言论文名：Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context论文作者：Zihang Dai et al.

2021-01-21 12:59:35 387 2

原创 [论文阅读]Self-Attention with Relative Position Representations

文章目录前言摘要1、Introduction & Motivation2、How to do ?3、Experiments Analysis总结前言论文名：Self-Attention with Relative Position Representations论文作者：Peter Shaw et.al.机构： Google Brain & Google期刊/会议名：NAACL 2018本文作者：XMU_MIAO日期：2021/1/18摘要 1、Introduc

2021-01-19 22:43:26 618

原创 [论文阅读]Character-Level Language Modeling with Deeper Self-Attention

文章目录前言摘要1、问题背景以及本文要解决的问题1.1 字符级别的语言模型1.2 RNN（变种）如何解决字符级别的语言模型1.3 本文要解决的问题2、如何解决该问题？2.1 Transformer Encoder with Causal Attention2.2 Auxiliary Losses2.2.1 Multiple Postions2.2.2 Intermediate Layer Losses2.2.3 Mutiple Targets2.1 Positional Embeddings3、实验分析（主

2021-01-16 17:36:03 700

原创 [论文阅读]Neural Machine Translation without Embeddings

文章目录前言摘要1、Introduction3、Embeddingless Model2、Experiments总结前言论文名：Neural Machine Translation without Embeddings论文作者：Uri Shaham et.al.机构： School of Computer Science, Tel Aviv University(以色列特拉维夫大学计算机科学学院) Facebook AI Research(脸书人工智能研究组)期刊/会议名：Arxiv 20

2021-01-13 15:25:40 188

原创 [论文阅读]Parameter Efficient Training of Deep CNN by Dynamic Sparse Reparameterization

文章目录前言摘要一、Introduction二、Experiments三、Results三、Conclusion总结前言论文名：Cross-Channel Intragroup Sparsity Neural Network论文作者：Zilin Yu et.al.机构： Peiking University Hangzhou Dianzi University Cerebras Systems 期刊/会议名：Arxiv 2019本文作者：XMU_MIAO日期：2020/12/3摘

2020-12-19 18:38:42 705

原创 [论文阅读]Cross-Channel Intragroup Sparsity Neural Network

文章目录前言摘要1、Introduction1.1 CCI-Sparsity vs.Balanced-Sparsity1.2 Constraint imposed by CCI/Balanced-Sparsity1.3 Algorithm Used to Train Networks with CCI-Sparsity4、Conclusion总结前言论文名：Dynamic Network Surgery for Efficient DNNs论文作者：Zilin Yu et.al.机构： Hang

2020-12-09 11:22:01 279

原创 [论文阅读]Dynamic Network Surgery for Efficient DNNs

文章目录前言摘要1、Introduction2、Dynamic Network Surgery2.1 Notations2.2 Pruning & Splicing2.3 Parameter Importance2.4 Convergence Acceleration3、Experiments&Results4、Conclusion总结前言论文名：Dynamic Network Surgery for Efficient DNNs论文作者：Yiwen Guo et.al.机构：In

2020-12-07 11:16:24 835

原创 [论文阅读]Balanced Sparsity for Efficient DNN Inference on GPU

文章目录前言摘要1、Introduction2、Experiments&ResultsPart1：测试本文针对提出Balanced SparsityBalanced\,\,SparsityBalancedSparsity的稀疏矩阵运算的GPU实现Part2：将本文提出的稀疏方法应用到深度学习的各个领域，包括CV、NLP和SpeechPart3：通过调节超参数（块大小等）和权重可视化研究本文提出的稀疏结构特征1、权重可视化2、敏感性3、Conclusion总结前言论文名：Balanced Sp

2020-12-05 20:28:10 469

原创 [论文阅读]Targeted Dropout

文章目录前言摘要核心思想总结前言论文名：Targeted Dropout论文作者：Aidan N. Gomez et.al.机构： Google Brain FOR.ai University of Oxford 期刊/会议名：NIPS 2018本文作者：XMU_MIAO日期：2020/12/4摘要神经网络由于拥有大量参数，有利于学习，但也存在高度冗余。这使得压缩神经网络而不对性能产生很大影响成为可能。本文中，我们引入了target dropouttarget\,\,dr

2020-12-04 20:19:45 425

原创 [论文阅读]Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation

文章目录前言摘要一、Introduction二、Experiments三、Results三、Conclusion总结前言论文名：Deep Encoder, Shallow Decoder：Reevaluating the Speed-Quality Tradeoff in Machine Translation论文作者：Jungo Kasai et.al.机构： Paul G. Allen School of Computer Science Engineering, University

2020-12-01 18:07:22 767

原创 [论文阅读]Sequence-Level Knowledge Distillation

文章目录前言摘要一、Introduction二、Distillation2.1 Knowledge Distillation2.2 Knowledge Distillation for NMT2.2.1 Word-Level Knowledge Distillation2.2.2 Sequence-Level Knowledge Distillation2.2.3 Sequence-Level Interpolation三、Experiments五、Discussion总结前言论文名：Sequence

2020-11-29 16:09:54 1764

原创 [论文阅读]知识蒸馏（Distilling the Knowledge in a Neural Network）

文章目录前言摘要一、Introduction二、Distillation三、Experiments3.1 MNIST3.2 Speech Recognition3.3 Specialists Models on Very Big Dataset3.3.1 Specialist Models3.3.2 Assigning Classes to Specialists3.3.3 Performing Inference with Ensembles of Specialists3.3.4 Results四、Us

2020-11-27 09:33:22 5320 1

原创 [论文阅读]Comparing Rewinding and Fine-tuning In Neural Network Pruning

文章目录前言摘要一、Introduction二、Contribution二、Methodology2.1 How Do We Train?2.2 How Do We Prune?2.2.1 what structure do we prune?2.2.2 what heuristic do we use to prune?2.3 How Do We Retrain?2.4 Do We Prune Iteratively?2.5 Metrics三、Experiments四、Results总结前言论文名：C

2020-11-23 19:49:51 1089

原创 [论文阅读]Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

文章目录前言摘要一、Introduction二、Contribution三、Methodology四、Experiments五、Analysis5.1 Head Distribution总结前言论文名：Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation论文作者：Abigail See et.al.期刊/会议名：EMNLP 2020本文作者：XMU_MIAO日期：2020/

2020-11-21 23:43:43 242

原创 [论文阅读]The Lottery Ticket Hyothesis:Finding Sparse, Trainable Neural Networks

文章目录前言摘要一、Introduction二、Contributions三、Result Analysis1、Winning Tickets In Fully-Connected Networks2、Winning Tickets In Convolutional Networks3、VGG and ResNet for CIFAR 10四、Discussion总结前言论文名：The Lottery Ticket Hyothesis:Finding Sparse, Trainable Neural

2020-11-20 16:14:12 960

原创 [论文阅读]Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

文章目录前言摘要一、Introduction二、Contribution三、Methodology3.1 彩票方法3.2 注意力自信度3.3 本文方法基本流程四、Experiments五、Analysis5.1 Head Distribution5.2 Architecture or Initialisation?总结前言论文名：Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

2020-11-20 15:36:57 537

原创 [算法理解]k-means algorithm

文章目录前言一、k-means算法介绍二、算法流程（引自西瓜书）三、算法实现（python）总结前言算法名：k-means algorithm本文作者：XMU_MIAO日期：2020/11/12一、k-means算法介绍 k-means算法是一种无监督聚类算法，其应用广泛。首先根据要划分的类数kkk随机选择kkk个样本作为均值向量（每个均值向量代表了一个类别），而后计算每一个样本点与这些均值向量的距离，并根据最近的均值向量将样本点进行归类，再将均值向量更新并循环上述步骤，直到均值向量不再更

2020-11-13 21:03:19 603

原创 [论文阅读]Compression of Neural Machine Translation Models via Pruning

文章目录前言摘要一、Introduction二、Contribution三、Approach3.1模型架构3.2 NMT中的权重3.3 剪枝方案3.4四、Experiments总结前言论文名：Compression of Neural Machine Translation Models via Pruning论文作者：Abigail See et.al.期刊/会议名：CoNLL 2016本文作者：XMU_MIAO日期：2020/11/12摘要与其他深度学习领域一样，神经机器翻译（NM

2020-11-13 19:09:17 340

原创 [论文阅读]On the Sparsity of Neural Machine Translation Models

文章目录前言摘要一、Introduction二、Contribution三、Approach（1）Standard NMT（2）Pruning（3）Rejuvenation四、Experiments4.1剪枝结果4.2 “复原”结果4.3 分析（1）避免陷入局部最优（2）语言方面的观察（3）翻译的充分性和流畅度总结前言论文名：On the Sparsity of Neural Machine Translation Models论文作者：Yong Wang et.al.期刊/会议名：EMNLP 2

2020-11-12 18:44:40 283

原创 [技术笔记]PyTorch中用于RNN变长序列填充函数的简单使用

技术笔记1、PyTorch中RNN变长序列的问题2、填充函数简介3、PyTorch代码示例总结1、PyTorch中RNN变长序列的问题 RNN在处理变长序列时有它的优势。在分批处理变长序列问题时，每个序列的长度往往不会完全相等，因此针对一个batch中序列长度不一的情况，需要对某些序列进行PAD（填充）操作，使得一个batch内的序列长度相等。 PyTorch中的pack_padded_sequence和pad_packed_sequence可处理上述问题，以下用一个示例演示这两个函数的简单使用方

2020-11-01 14:54:38 1222

原创 [论文阅读]Neural Machine Translation By Jointly Learning to Align and Translate

文章目录前言摘要一、神经机器翻译1、机器翻译2、基于RNN的Encoder-Decoder架构二、文章贡献三、模型架构1.译码器：整体概述2.编码器：用于注释序列的双向RNN四、实验设置五、代码实现六、结果分析总结前言论文名：Neural Machine Translation By Jointly Learning to Align and Translate论文作者：Dzmitry Bahdanau et.al.期刊/会议名：ICLR 2015本文作者：XMU_MIAO摘要神经机器翻

2020-10-31 21:13:17 1319

原创 [论文阅读]Using the Output Embedding to Improve Language Models

论文总结（Transformer中Embedding部分提到的权重共享）1)摘要2)文章贡献3)实验设计3)实验结果论文名：Using the Output Embedding to Improve Language Models论文作者：Ofir Press and Lior Wolf期刊/会议名：EACL 2017本文作者：XMU_MIAO1)摘要我们研究了神经网络语言模型（NNLM）中顶层权重矩阵（输入嵌入矩阵和输出嵌入矩阵[pre-softmax映射矩阵]），我们证明了这个矩阵构成了

2020-10-31 10:48:38 1127

原创 [论文阅读]Attention Is All You Need

论文总结摘要文章贡献方法介绍结果分析论文名：Attention Is All You Need论文作者：Ashish Vaswani 等期刊/会议名：NIPS 2017本文作者：XMU_MIAO摘要主流的序列转化模型是基于包含一个编码器和一个译码器的复杂循环或卷积神经网络。表现最好的模型也是通过注意力机制连接编码器和译码器。文章贡献方法介绍结果分析...

2020-10-21 20:14:39 775 1

原创 [论文阅读]Language Modeling with Gated Convolutional Networks

目前主要建模的主要方法是基于循环神经网络（CNN），其在任务上的成功往往与其能够无限制捕获上下文信息的特性相关联。**在本文中，我们开发出一种通过堆叠卷积实现的有限上下文方法，由于其能够在符号序列上并行化，因而更加高效。我们提出一种新的简单的门控机制，其性能优于Oord et al.(2016b)并研究了关键架构选择的影响。** 所提出的方法在WikiText-103基准数据集上取得了最先进的结果，同样地，在Google Billion Words（GBW）基准数据集上取得了具有竞争力的结果。相较于

2020-10-19 19:45:09 1397

ZY_miao的博客