0. p(sent1, sent2) = ? in all sentence_pair()? WRONG!
1. the question is p(sent1, sent2) > p(sent3, sent4) ?
=> ranking problem, patial order may be confict
=> get the total order
1.1 => dist measure, like cos, eular
1.2 =>can we solve it by probability?
=> yes
1.2.1 p(sent2 | sent1) = ?, p(sent1 | sent2) = ?, DIST IS p(sent2 | sent1) * p(sent1 | sent2). P REPRESENT THE PROBABILITY WHICH SENT2 CAN INFER SENT1.
1.2.2 p(word2_1, word2_2, ... | word1_1, word1_2, )
1.2.3 p(word2_1, word2_2, ... | word1_1) * p(word2_1, word2_2, ...., word1_2 |word1_1) WORD ORDER IS NO USE, deleted
1.2.4 p(sent2 | sent1) = p(sent2 | word1_1) * p(word2_1 | word1_1) * p(sent2 | word2_1), question is p(y|x) is not the order but the sim
//known Sim(word1_1, word2_1) = .. ,Sim(word1_1, word2_2) = .. =>
//p(word2_1 | word1_1) = p(word2_1, word1_1) / sum(p(word, word1_1))
/***************************************************************************/
1.2.1 P(SENT2|SENT1) = P(WORD2_1 | WORD1_1, WORD1_2, WORD1_3, ...) * P(WORD2_2 | WORD1_1, WORD1_2, WORD1_3, ...) * ..
//=> P(WORD2_1, WORD1_1, ....) / P(WORD1_1, WORD1_2, WORD1_3, ...) ..
1.2.2 P(SENT2|SENT1) = P(SENT2_1 | WORD1_1, WORD1_2, WORD1_3, ...) * P(WORD2_2 | WORD1_1, WORD1_2, WORD1_3, ...)
=> P(SENT2_1, WORD1_1, ....) / P(WORD1_1, WORD1_2, WORD1_3, ...) ..
=> P(SENT2_1) / P(WORD1_1, WORD1_2, WORD1_3, ...) ...
Algorithm:
step1: top1000 represent sent1
step2: get P, how much happen in top1000 / len(sent1)= P(SENT2_1) / P(WORD1_1, WORD1_2, WORD1_3, ...)
step3: get R, same as step3
step4: top 1000, F1 = 2.0*P*R/(P+R)
欢迎使用CSDN-markdown编辑器
最新推荐文章于 2021-01-07 23:01:28 发布