[Paper Summary] Designing and Interpreting Probes with Control Tasks [Hewitt & Liang 2019]

最新推荐文章于 2024-07-25 06:30:00 发布

芝麻挞

最新推荐文章于 2024-07-25 06:30:00 发布

阅读量142

点赞数

分类专栏：我爱读的paper 文章标签：自然语言处理

本文链接：https://blog.csdn.net/weixin_43928665/article/details/118641673

版权

Designing and Interpreting Probes with Control Tasks [[Hewitt & Liang 2019](https://arxiv.org/abs/1909.03368)]

tl;dr: A good probe should be selective, achieving high linguistic task acc and low control task acc.

这篇大概是最典型的 Arguing for simple probes，当然应该还有文中提到的另一篇有相同观点的：[Alain and Bengio: Understanding intermediate layers using linear classifier probes]: The task of deep neural network classifier is to come up with a representation for the final layer that can be easily fed to a linear classifier (i.e. the most elementary form of useful classifier)

Motivation for control tasks

Favor ‘ease of extraction’
Disourage probes that learn the task by themselves
As long as a representation is a lossless encoding, a sufficiently expressive probe with enough training data can learn any task on top of it. Such expressive probes don’t have discrimination power to measure the goodness of a representation. This is in contrast to [Pimental 2020], which states there is no difference between learning the task and representations encoding information

Desiderata for control tasks

Control tasks have the same input and output space as a linguistic task (e.g. POS) but can only be learned if the probe memorizes the mapping, since the inputs are no longer prodictive of their labels. This corresponds to label-shuffled in [Pareto Probing], which have been criticized for only corrupting the input-output mapping without corrupting the input structures. The structured input might lead to representations from which info is easier to extract.
The more a probe is able to make task output decisions independently of the linguistic properties of a representation, the less its acc on a linguistic task necessarily reflects the properties of the representation.
Control tasks must have two properties at a high level. a) Structure: The output for a word token is a deterministic function of the word type. 又被[Pimental 2020]喷了，这确实就很不现实。 b) Randomness: The output for each word type

最低0.47元/天解锁文章

芝麻挞

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[Paper Summary] Designing and Interpreting Probes with Control Tasks [Hewitt & Liang 2019]

Designing and Interpreting Probes with Control Tasks [[Hewitt & Liang 2019](https://arxiv.org/abs/1909.03368)]tl;drA good probe should be selective, achieving high linguistic task acc and low control task acc.Motivation for control tasksFavor ‘
复制链接

扫一扫