轻量级MT

最新推荐文章于 2022-12-11 15:29:02 发布

weixin_39103096

最新推荐文章于 2022-12-11 15:29:02 发布

阅读量223

点赞数 1

分类专栏：机器翻译

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39103096/article/details/116454593

版权

机器翻译专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1. 提升self-attention的时间、空间利用率

Linformer: Self-Attention with Linear Complexity

论文中的相关工作提及：提高Transformer效率的常用技术：

混合精度
Mixed precision training. 2017
fairseq: A fast, extensible toolkit for sequence modeling. 2019
Quantization and training of neural networks for efficient integer-arithmetic-only inference. 2018
Training with quantization noise for extreme fixed-point compression. 2020
知识蒸馏
Distilling the knowledge in a neural network. 2015
Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. 2019
Sparse Attention
Generating long sequences with sparse transformers. 2019
Blockwise self-attention for long document understanding. 2019
LSH Attention
Reformer: The efficient transformer. 2020
Improving Optimizer Efficiency
Gpipe: Efficient training of giant neural networks using pipeline parallelism. 2019
Training deep nets with sublinear memory cost. 2016

2. 数据增强

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

3. mt book推断加速

输出层的词汇选择
On Using Very Large Target Vocabulary for Neural Machine Translation. 2015
消除冗余计算
（1）对不同层的注意力权重进行共享
Sharing Attention Weights for Fast Transformer. 2019
（2）不同层的参数进行共享
Recurrent Stacking of Layers for Compact Neural Machine Translation Models. 2019 代码 tf
轻量解码端及小模型
（1）把解码端的网络变得更 “浅”、更 “窄”
考虑使用知识精炼（见7.5.3节）或深层编码器（见7.3.1节）配合基于小模型的解码神经网络一起使用
（2）化简 Transformer 的解码端神经网络
①使用平均注意力机制代替原始的 Transformer 自注意力机制
Accelerating Neural Transformer via an Average Attention Network. 2018 代码 tf

②使用运算更轻的卷积操作代替注意力模块
Pay Less Attention with Lightweight and Dynamic Convolutions. 2019 论文解读 代码 fairseq!!!

③基于共享注意力机制的模型也是一种典型的轻量模型
Sharing Attention Weights for Fast Transformer. 2019

④使用异构神经网络也是一种平衡精度和速度的有效方法
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. 2018 代码1 fairseq!!! 代码2 tensor2tensor
批量推断
低精度运算
非自回归翻译
其他

进化的Transformers

Linear Transformers Are Secretly Fast Weight Memory Systems. 2021 代码 pytorch fairseq!!!
DEFINE: deep factorized input token embeddings for neural sequence modeling. ICLR 2020
DELIGHT: Deep and Light-Weight Transformer. ICLR 2021 代码 fairseq
Performers: Rethinking attention with performers. ICLR 2021 代码 tf 里面有一部分pytorch的实现，数据是随机初始化的
Efficient transformer for mobile applications. ICLR 2020
Learning Light-Weight Translation Models from Deep Transforer. 2020
Reformer: the efficient transformer. ICLR 2020 代码 trax
Universal transformers. ICLR 2019 代码 trax tensor2tensor
Depth-adaptive transformer. ICLR 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. 2020

Are Pre-trained Convolutions Better than Pre-trained Transformers?. 2021
Measuring and Increasing Context Usage in Context-Aware Machine Translation. 2021

基础知识

机器学习——低秩矩阵分解中低秩的意义、矩阵填补、交叉验证
 矩阵低秩的意义?
Depthwise卷积与Pointwise卷积

weixin_39103096

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
轻量级MT

1. 提升self-attention的时间、空间利用率Linformer: Self-Attention with Linear Complexity论文中的相关工作提及：提高Transformer效率的常用技术：混合精度Mixed precision training. 2017fairseq: A fast, extensible toolkit for sequence modeling. 2019Quantization and training of neural network
复制链接

扫一扫

专栏目录

weixin_39103096 CSDN认证博客专家 CSDN认证企业博客

码龄7年

15: 原创

78万+: 周排名

39万+: 总排名

7349: 访问

: 等级

192: 积分

4: 粉丝

5: 获赞

4: 评论

15: 收藏

私信

关注

热门文章

分类专栏

机器翻译的质量评估 2篇
机器翻译 1篇

最新评论

fairseq用法
卡布里藍: 您好，我pip install --editable ./后运行fairseq-preprocess报错importlib.metadata.PackageNotFoundError: fairseq，请问这是什么原因
fairseq用法
weixin_39103096: 怎么会不匹配呢？idx文件不是fairseq-preprocess过程生成的吗？
fairseq用法
发糕不会写代码: 请问您知道.idx文件不匹配fairseq的文件格式应该怎么处理嘛？预处理我用的就是fairseq…训练的时候报错说格式不符合唉
轻量级MT
不正经的kimol君: TQL，大大大佬

最新文章

目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。