2021 ACL Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

阅读这篇文章的目的是为了了解hypernet,相关代码 https://github.com/rabeehk/hyperformer

parameter-efficient fine-tuning的方法依赖于引入一个adapter module。这篇文章使用一个shared hypernetworks来为每一个tasks和每一个layer中生成adapter,which is condition on task、adapter position、layer id in a transformer model.

image-20230118105855695

Introduction

预训练的LLM表现出来很好的效果:Transfer learning from pretrained large-scale language models yields state-of-the-art results in a variety of tasks (Devlin et al., 2019; Radford et al., 2018; Liu et al., 2019b).

为什么这篇文章的方法能够起作用?

The hypernetwork is jointly learned between all tasks and is thus able to share information across them, while negative interference is minimized by generating separate adapter layers for each task. For each new task, our model only requires learning an additional task embedding, reducing the number of trained parameters.

这篇文章的主要贡献有哪些?

  1. 提出了这么一个框架(实际上我觉得就是把hypernetwork拿过来用了一下
  2. 本文的方法要比之前的方法要好
  3. 本文的方法在GLUE上完成了验证
  4. 在unseen in-domain tasks 上进一步分析这个方法

Method

Task Conditional Adapter Layers

2.1这一部分实际上就是采用了类似hypernet的结构,在hypernetworks那篇文章中用的是一个embedding vector用来描述给定layer的整个weights的信息,但是在这一部分中,作者采用的是使用一个Embedding来描述输入任务的信息。

In this work, we propose conditional adapter modules, in which we generate the adapters weights based on input task embeddings using shared hypernetworks (Ha et al., 2017), which capture information across tasks that can be used to positively transfer to other relevant tasks.

Task Conditional Layer Normalization

这一部分也是作为一个函数,来将task Embeddings的信息生成两个参数:

image-20230118153401507

Task Conditioned Hypernetworks

这部分定义的是Hypernetworks这一部分,介绍了一下在这篇文章中是怎么使用Hypernetworks的。

Hyperformer++ 和 Hyperformer之间的区别?

其实这就是在Hyperformer的基础上,补充了每个task的信息、每个adapter的位置、每个Transformer中的layer id

This way, the hypernetwork is able to produce distinct weights for each task, adapter position, and layer of a transformer. Furthermore, layer id and adapter position embeddings are parameters that are learned via back-propagation, allowing us to train the whole model end-to-end conveniently.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值