Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [Elazar 2020]
tl;dr
Probing results cannot infer behavioral conclusions. Probing provides no evidence for or against the actual use of this information by the model.
We focus on how the information is being used, rather than on what information is encoded. Put it another, we focus on the information influence on the model’s behavior, rather than on the ability to extract it from the representation.
Keypoints
- We study POS and dependency labels. As opposed to common belief, high probing performance does not mean that the probed information is used for predicting the main task.
- We study what BERT actually relies on at infr time by specifically removing certain info & seeing how performance is influenced. More formally, we are interested in the change in prediction of a classifier c which is caused by the removal of property Z Z Z from the representation h ( x i ) h(x_i) h(x