使用场景
中文英文混合文本,近似度比较
使用方式
pip安装
pip install sentence-transformers
导入包
import sys
from sentence_transformers.util import cos_sim
from sentence_transformers import SentenceTransformer as SBert
使用模型
下载
模型网站链接为:https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/
然后查找paraphrase-multilingual-MiniLM-L12-v2这个模型名字,点击下载即可
导入
model = SBert("C:\\Users\xxxx\Downloads\\paraphrase-multilingual-MiniLM-L12-v2")
计算相似度
sentences1 ="xxxxx1"
sentences2 = "xxxxxx2"
# Compute embedding for both lists
embeddings1 = model.encode(sentences1)
embeddings2 = model.encode(sentences2)
# Compute cosine-similarits
cosine_scores = cos_sim(embeddings1, embeddings2)
cosine_scores
限制
sentence有512token限制
参考:https://blog.csdn.net/yuanzhoulvpi/article/details/121755062