一开始知道Probing这个方向大概就是从Voita的NLP with friends talk吧,当时模糊地理解为是一种"neural network interpretation"的方式,其实这种理解没有错误,只是它只说中了一半,这周读了若干篇paper along this stream, 发现probing还有一个目的是serve as evaluation metrics for representation learning. 不过这就更玄学了,比NLG的evaluation还要玄学。起码NLG人是可以像判作文一样手判的, 而至于一个representation学的好不好nobody knows。这就直接引向了probing里面最为老铁扎心的问题: When a probe achieves high acc on a linguistic task using a representation, can we conclude that the representation encodes linguistic structure, or has the probe just learned the task?
Anyway我还是想引出[Pimental 2020 Pareto Probing]里面对probing的定义,这样一提到probing我就不再只是理解个大概但又说不出个所以然了:
We define in this work as training a supervised classifier (known as a probe) on top of pretrained models’ frozen representations. By analyzing the classifier’s performance, one can access how mu