语义相似度的计算

最新推荐文章于 2024-07-02 13:18:10 发布

筱楠girl

最新推荐文章于 2024-07-02 13:18:10 发布

阅读量2.4k

点赞数 1

分类专栏： NLP python 文章标签： NLP python

本文链接：https://blog.csdn.net/l1339704271/article/details/103088398

版权

本文探讨了如何计算两个句子的语义相似度，主要方法是将句子转换为向量并利用余弦相似度进行比较。

摘要由CSDN通过智能技术生成

语义相似度就是计算两个句子之间的相似度，可以将两个句子向量化之后，计算余弦距离。

# -*- coding: utf-8 -*-
"""
Created on Thu Feb 21 20:18:38 2019

@author: lcl
"""
from sklearn.feature_extraction.text import CountVectorizer
import math
import jieba
from setting import logger
#创建停用词list
def stop_word_list(path):
    stopwords = [line.strip() for line in open(path, 'r', encoding='utf-8').readlines()] 
    return stopwords
#预处理文本
def preprocess(text):
    if isinstance(text,str):
        text_with_spaces=""
        textcut = jieba.cut(text.strip()) 
        stopwords = stop_word_list("data/stop_words.txt")
        for word in textcut:
            if word not in stopwords:
                if word != '\t':
                    text_with_spaces += word + " "
    else:
        raise TypeError('text should be str')
    return text_with_spaces

def norm_vector_nonzero(ori_vec):
    ori