SentenceTransformers 库介绍

最新推荐文章于 2024-12-11 14:03:22 发布

qq_27390023

最新推荐文章于 2024-12-11 14:03:22 发布

阅读量791

点赞数 1

文章标签：深度学习神经网络人工智能

本文链接：https://blog.csdn.net/qq_27390023/article/details/131348737

版权

SentenceTransformers是一个基于PyTorch和Transformers的Python库，能处理100多种语言的文本嵌入，适用于语义文本相似性、搜索和同义词挖掘。用户可以使用预训练模型或进行微调。文中展示了如何计算句子嵌入和执行语义文本相似性任务。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SentenceTransformers 是一个可以用于句子、文本和图像嵌入的Python库。可以为 100 多种语言计算文本的嵌入并且可以轻松地将它们用于语义文本相似性、语义搜索和同义词挖掘等常见任务。该框架基于 PyTorch 和 Transformers，并提供了大量针对各种任务的预训练模型。还可以很容易根据自己的模型进行微调。

### 1. install

pip install -U sentence-transformers

### 2. Computing Sentence Embeddings

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

#Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']

#Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)

#Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")
# 注： 网络连接问题，导致模型下载失败！

### 3. Semantic Textual Similarity
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')

# Two lists of sentences
sentences1 = ['The cat sits outside',
             'A man is playing guitar',
             'The new movie is awesome']

sentences2 = ['The dog plays in the garden',
              'A woman watches TV',
              'The new movie is so great']

#Compute embedding for both lists
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)

#Compute cosine-similarities
cosine_scores = util.cos_sim(embeddings1, embeddings2)

#Output the pairs with their score
for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[i], cosine_scores[i][i]))