python 文本相似度计算函数_Python Gensim：如何使用LDA模型计算文档相似度？

最新推荐文章于 2023-06-19 11:38:14 发布

weixin_39622710

最新推荐文章于 2023-06-19 11:38:14 发布

阅读量540

点赞数

文章标签： python 文本相似度计算函数

I've got a trained LDA model and I want to calculate the similarity score between two documents from the corpus I trained my model on.

After studying all the Gensim tutorials and functions, I still can't get my head around it. Can somebody give me a hint? Thanks!

解决方案

Don't know if this'll help but, I managed to attain successful results on document matching and similarities when using the actual document as a query.

dictionary = corpora.Dictionary.load('dictionary.dict')

corpus = corpora.MmCorpus("corpus.mm")

lda = models.LdaModel.load("model.lda") #result from running online lda (training)

index = similarities.MatrixSimilarity(lda[corpus])

index.save("simIndex.index")

docname = "docs/the_doc.txt"

doc = open(docname, 'r').read()

vec_bow = dictionary.doc2bow(doc.lower().split())

vec_lda = lda[vec_bow]

sims = index[vec_lda]

sims = sorted(enumerate(sims), key=lambda item: -item[1])

print sims

Your similarity score between all documents residing in the corpus and the document that was used as a query will be the second index of every sim for sims.

weixin_39622710

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 文本相似度计算函数_Python Gensim：如何使用LDA模型计算文档相似度？

I've got a trained LDA model and I want to calculate the similarity score between two documents from the corpus I trained my model on.After studying all the Gensim tutorials and functions, I still can...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。