自然语言处理--利用点积度量文本之间的重合度

最新推荐文章于 2022-06-09 16:20:01 发布

糯米君_

最新推荐文章于 2022-06-09 16:20:01 发布

阅读量363

点赞数

分类专栏：自然语言处理文章标签： python 机器学习算法 nlp

本文链接：https://blog.csdn.net/fgg1234567890/article/details/111463762

版权

自然语言处理专栏收录该内容

59 篇文章 8 订阅

订阅专栏

如果能够度量两个文本之间的重合度，就可以很好地估计它们所用词的相似程度，而这也是它们语义上重合度的一个很好的估计。

import numpy as np
import pandas as pd

sentences = """Thomas Jefferson began building Monticello at the age of 26.\n"""
sentences += """Construction was done mostly by local masons and carpenters.\n"""
sentences += "He moved into the South Pavilion in 1770.\n"
sentences += """Turning Monticello into a neoclassical masterpiece was Jefferson's obsession."""
corpus = {}
for i, sent in enumerate(sentences.split('\n')):
    corpus['sent{}'.format(i)] = dict((tok, 1) for tok in sent.split())
# pd.DataFrame.from_records()专门用于从元组和字典中创建数据框
df = pd.DataFrame.from_records(corpus).fillna(0).astype(int).T
print(df)

df = df.T
print(df.sent0.dot(df.sent1))
print(df.sent0.dot(df.sent2))
print(df.sent0.dot(df.sent3))

糯米君_

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
自然语言处理--利用点积度量文本之间的重合度

如果能够度量两个文本之间的重合度，就可以很好地估计它们所用词的相似程度，而这也是它们语义上重合度的一个很好的估计。import numpy as npimport pandas as pdsentences = """Thomas Jefferson began building Monticello at the age of 26.\n"""sentences += """Construction was done mostly by local masons and carpenters.
复制链接

扫一扫

专栏目录