定义符号:
X i = ∑ j = 1 N X i , j P i , k = X i , k X i r a t i o i , j , k = P i , k P j , k X_i = \sum_{j=1}^N{X_{i,j}}\\ P_{i,k} = \frac{X_{i,k}}{X_i}\\ ratio_{i,j,k} = \frac{P_{i,k}}{P_{j,k}} Xi=j=1∑NXi,jPi,k=XiXi,kratioi,j,k=Pj,kPi,k
ratioi,j,k的值 | 单词j,k相关 | 单词j,k不相关 |
---|---|---|
单词i,k相关 | 趋近1 | 很大 |
单词i,k不相关 | 很小 | 趋近1 |
推导:
假设已经得到词向量,则词向量和共现矩阵应该具有很好的一致性。假设词向量$v_i ,v_j, v_k$
计算 r a t i o i , j , k ratio_{i,j,k} ratioi,j,k的函数为 g ( w i , w j , w k ) g(w_i ,w_j ,w_k) g(wi,wj,wk),则:
P i , k P j , k = r a t i o i , j , k = g ( w i , w j , w k ) \frac{P_{i,k}}{P_{j,k}} = ratio_{i,j,k} = g(w_{i},w_{j},w_{k}) Pj,kPi,k=ratioi,j,k=g(wi