[Paper Summary] Information-theoretic probing for linguistic structure [Pimental 2020]

本文探讨了信息论在探查语言结构中的应用,指出寻找上下文嵌入句子中的句法是无意义的,因为根据假设1,我们预先就知道上下文单词嵌入包含了句子相同的信息量。文章呼吁使用更复杂的探针,并对控制任务提出批评,强调探针的任务不应是揭示表示是否编码语言结构,而应关注信息的提取难度。
摘要由CSDN通过智能技术生成

Information-theoretic probing for linguistic structure [Pimental 2020]


Teaser
… under our operationalization, the endeavour of finding syntax in contextualized embeddings sentences is nonsensical. This is because, under Assumption 1, we know the answer a priori—the contextualized word embeddings of a sentence contain exactly the same amount of information about syntax as does the sentence itself.


Keypoints
  • Call for complex probes
    One should always select the highest performing probe one can without resorting to artificial constraints, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation.
  • Call for harder probing tasks or formal definition of ease of extraction, since the current operationalization doesn’t reveal much advantage of contextual embeddings over non-contextual ones.
    Most of the info indded to tag POS is ancoded at the lexical level, and does not require sentential context. Put it simple, words are not very embiguous with respect to POS. The gain of BERT over a control for dependency labeling is also modest. So the main point that favors the argument for BERT might be ‘BERT makes info more readily accessible’, rather than ‘BERT provides extra info that wasn’t there with other representations’
  • Contextual word embeddings contain the same amount of info about the linguistic property of interest as the original sentence. This follows from the data-processing inequality under a mild assumption.
  • Probing for linguistic properties in representations may not be a well grounded enterprise at all.
    Linguistic properties are always there, in some sense, known as a priori. It might make more sense to pursue ease of extraction. The famous question written in [Hewitt & Liang 2019] about representation encodes linguistic structure v.s probes just learn the task is a false dichotomy, since there is no different between learning a task and representations encoding the linguistic structure. Probing provides no more insight about linguistic features in the representation cuz we know they are there ahead of time. And we want the best probe such that we get the tightest bound to the actual distribution p(t | r), where t is the linguistic property-valued r.v and r is the representation-valued r.v. We estimate this distribution to learn about the input sentence itself, not about the representation.

Assumptions
  1. Every contextualized embedding is unique

Note that this require words with the same identity but occur in different sentences have different embeddings.

  1. There exists a function id() that maps a contextualized embedding to its word type. Note that id() is not a bijection since multiple embeddings will map to the same type.

Any non-contextualized word embedding will contain no more information than a contextualized word embedding, cuz non-contextualized ones only tell word identity, but contextualized ones tell word identity + sth else (maybe its neighbors). More formally, this results from data-processing inequality I ( T ; R ) ≥ I ( T ; i d ( R ) ) = I ( T ; W ) ≥ I ( T ; e ( W ) ) I(T;R) \geq I(T; id(R)) = I(T; W) \geq I(T; e(W)) I(T;R)I(T;id(R))=I(T;W)I(T;e(W)), where W W W is word identity, e ( ) e() e() is a look-up function that maps word identity to a non-contextualized (i.e. word type-level) embedding. Data-processing inequality states that when you apply a function to an input, you only compress the info and never create new info.


Gain

How much info did we gain from contextualized embeddings over a type-level control?

  • G ( T , R , c

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值