句子嵌入_带句子转换器库的句子嵌入

句子嵌入

I came across this simple to use sentence-transformers library when I was recently working on implementing semantic search functionality. As part of this, I had to index the dense vector representation of each document into Elasticsearch for the semantic search to work. With this library, I was able to implement this functionality quickly and effectively. I hope that you will find this article helpful.

当我最近致力于实现语义搜索功能时,我遇到了这个易于使用的sentence-transformers库。 作为其中的一部分,我必须将每个文档的密集矢量表示形式索引到Elasticsearch中,以使语义搜索起作用。 使用该库,我能够快速有效地实现此功能。 希望本文对您有所帮助。

This article requires knowledge of Embeddings (word embeddings or sentence embeddings). You can refer to this article to quickly refresh your memory. If you already know about Embeddings, you can continue reading.

本文需要嵌入知识(单词嵌入或句子嵌入)。 你可以参考这个文章,快速刷新你的记忆。 如果您已经了解嵌入,则可以继续阅读。

安装 (Installation)

pip install -U sentence-transformers

用法 (Usage)

1.句子嵌入(1. Sentence Embedding)

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('model_name_or_path')

In the below example, we have passed a pre-trained model distilbert-base-nli-stsb-mean-tokens to SentenceTransformer for computing the sentence embedding. The full list of pre-trained models is found here. Note that there is no one embedding that could work for all the tasks, so we should try some of these models and select the one which works best.

在下面的示例中,我们将预训练的模型distilbert-base-nli-stsb-mean-tokens传递给SentenceTransformer以计算句子嵌入。 此处提供了预训练模型的完整列表。 请注意,没有一种嵌入可以适用于所有任务,因此我们应该尝试其中一些模型,然后选择最适合的模型。

Note: sentence-transformers models are also hosted on the Huggingface repository. So we can directly use Hugginface’s Transformers library for generating sentence embedding without installing sentence-transformers library. The sample code is given here.

注意sentence-transformers模型也托管在Huggingface信息库中。 因此,我们可以直接使用Hugginface的Transformers库来生成句子嵌入,而无需安装sentence-transformers库。 示例代码在此处给出。

2.语义文本相似性 (2. Semantic Textual Similarity)

Now that we have understood how to generate the sentence embedding, the next step is to compare the sentences for semantic textual similarity and rank them based on the cosine similarity.

现在我们已经了解了如何生成句子嵌入,下一步是比较句子的语义文本相似性,并根据余弦相似度对它们进行排名。

The recommended models for Sentence Similarities are listed below. These models are trained on NLI & STS data and evaluated on the STSbenchmark dataset. The authors recommend the model distilbert-base-nli-stsb-mean-tokens as it gives a perfect balance between speed and performance.

下面列出了句子相似性的推荐模型。 这些模型在NLISTS数据上进行训练,并在STSbenchmark数据集上进行评估。 作者推荐该模型distilbert-base-nli-stsb-mean-tokens因为它可以在速度和性能之间实现完美的平衡。

roberta-large-nli-stsb-mean-tokens — STSb performance: 86.39roberta-base-nli-stsb-mean-tokens — STSb performance: 85.44bert-large-nli-stsb-mean-tokens — STSb performance: 85.29distilbert-base-nli-stsb-mean-tokens — STSb performance: 85.16

roberta-large-nli-stsb平均令牌— STSb性能:86.39 roberta-base-nli-stsb平均令牌— STSb性能:85.44 bert-large-nli-stsb平均令牌— STSb性能:85.29 distilbert- base-nli-stsb-mean-tokens — STSb性能:85.16

Let’s look at an example of cosine similarity between the sentences we have used in the previous example:

让我们看一下在上一个示例中使用的句子之间的余弦相似度示例:

The method uses a brute-force approach to find the highest-scoring pairs, which has quadratic complexity. For longer sentences, this method is not feasible. Paraphrase Mining which is discussed next is the optimal method.

该方法使用蛮力方法找到得分最高的对,具有二次复杂度。 对于较长的句子,此方法不可行。 接下来讨论的复述挖掘是最佳方法。

3.复述挖掘 (3. Paraphrase Mining)

Paraphrase Mining is used when we need to deal with a large collection of sentences (10,000 and more). A more detailed explanation of Paraphrase Mining is found here.

当我们需要处理大量句子(10,000个及更多)时,将使用复述挖掘。 可在此处找到对复述采矿的更详细说明。

Let’s look at an example using Paraphrase Mining:

让我们来看一个使用复述挖掘的示例:

4.语义搜索 (4. Semantic Search)

Traditional search engines were designed to work with the lexical based search but using semantic search we can find documents based on synonyms. Using the techniques we learned above we can implement semantic search functionality. Semantic search seeks to improve search accuracy by understanding the content of the search query.

传统的搜索引擎被设计用于基于词法的搜索,但是使用语义搜索,我们可以找到基于同义词的文档。 使用我们上面学到的技术,我们可以实现语义搜索功能。 语义搜索旨在通过了解搜索查询的内容来提高搜索准确性。

Semantic search is most commonly used in Search Engines such as Elasticsearch. If you have a basic understanding of Elasticsearch and go through to this link understand how Semantic Search can be implemented Elasticsearch.

语义搜索是最常见的搜索引擎,例如Elasticsearch。 如果您对Elasticsearch有基本的了解,请访问此链接,以了解如何实现Elasticsearch的语义搜索。

结论 (Conclusion)

Hope you have understood how to use the sentence-transformers library for computing sentence embeddings, how to get the similarity between the sentences, and finally how we can make sure of sentence embedding to implement semantic search.

希望您了解如何使用sentence-transformers库来计算句子嵌入,如何获取句子之间的相似度以及最终如何确保句子嵌入以实现语义搜索。

Thank you for reading this article. You can reach me at https://www.linkedin.com/in/chetanambi/

感谢您阅读本文。 您可以通过https://www.linkedin.com/in/chetanambi/与我联系

翻译自: https://medium.com/towards-artificial-intelligence/sentence-embeddings-with-sentence-transformers-library-7420fc6e3815

句子嵌入

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在使用Jetty嵌入式服务器时,你可以通过配置Servlet来实现返回类型转换器。下面是一种可能的配置方法: 1. 创建一个Servlet类,用于处理HTTP请求和响应: ```java import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import java.io.IOException; public class MyServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException { // 处理请求并设置响应的内容类型和内容 resp.setContentType("application/json"); resp.getWriter().println("{ \"message\": \"Hello, world!\" }"); } } ``` 2. 创建一个ServletHolder对象并设置它的servlet实例: ```java import org.eclipse.jetty.servlet.ServletHolder; // ... ServletHolder servletHolder = new ServletHolder(new MyServlet()); ``` 3. 创建一个ServletHandler对象并将ServletHolder添加到它的上下文中: ```java import org.eclipse.jetty.servlet.ServletContextHandler; // ... ServletContextHandler servletContextHandler = new ServletContextHandler(); servletContextHandler.addServlet(servletHolder, "/"); ``` 4. 创建一个Jetty服务器实例并将ServletHandler设置为它的处理程序: ```java import org.eclipse.jetty.server.Server; // ... Server server = new Server(8080); server.setHandler(servletContextHandler); ``` 5. 启动服务器: ```java server.start(); ``` 这样,当你访问http://localhost:8080/时,Jetty服务器将返回类型设置为application/json,并以JSON格式返回响应内容。 请注意,这只是一种基本的示例配置方式,你还可以根据你的需求进行更详细的配置。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值