2019.9.5 note
A Structural Probe for Finding Syntax in Word Representations
- The probe identifies a linear transformation under which squared L2 distance encodes the distance between words in the parse tree, and one in which squared L2 norm encodes depth in the parse tree. Using this probe, we show that such transformations exist, providing evidence that entire syntax trees are embedded implicitly in deep models’ vector geometry.
This defines d ( x , y ) = f ( x ) T f ( y ) d(x, y)=f(x)^Tf(y) d(x,y)=f(x)Tf(y) and f ( x ) = A v x f(x)=Av_x f(x)=Avx for BERT embedding v v v. This finds that this d d d can learn the distances on parsing tr