adalm pluto_将Apache Pluto与Lucene搜索引擎示例教程集成

cunchi4221

于 2020-07-09 05:27:06 发布

阅读量245

点赞数

文章标签：搜索引擎大数据 python java 数据库

原文链接：https://www.journaldev.com/5105/integration-apache-pluto-with-lucene-search-engine-example-tutorial

版权

adalm pluto

Knowledge information retrieval isn’t a luxury requirement that your application may or may not provide. Best applications are those that are providing cross-site search, so it’s minimize the efforts that you spent when trying to find or locate a piece of information.

知识信息检索并不是您的应用程序可能会或可能不会提供的奢侈要求。最好的应用程序是提供跨站点搜索的应用程序，因此可以最大程度地减少您在查找或定位信息时所花费的精力。

Typically, you choose the Portal to build your application for many advantages that it provides. Mainly, you can consider the most important one is the ability to integrate with the latest search engines, thus, provide one central location for the users to get all contents searched including those static content like HTML, file systems and etc.

通常，您选择门户网站来构建应用程序，以提供其许多优点。主要，您可以考虑最重要的一项是与最新搜索引擎集成的能力，从而为用户提供一个中心位置，使用户可以搜索所有内容，包括诸如HTML，文件系统等静态内容。

This tutorial will guide you through a well-thought-out steps that lead you finally into integrating Apache Pluto Portal with the latest version of Apache Lucene Search.

本教程将指导您通过周全的步骤，最终引导您将Apache Pluto Portal与最新版本的Apache Lucene Search集成在一起。

Lucene概念 (Lucene Concept)

Lucene is a search engine, it contains a lot of components that work each together to get you finally the result that you want. It’s important for you to get passed upon these components as that should help you gather the maximum benefit for what already supposed to be at this tutorial.

Lucene是一个搜索引擎，它包含许多相互配合的组件，以最终使您获得所需的结果。对您来说，传递这些组件很重要，因为这将帮助您从本教程中获得最大的收益。

Mainly, there’s two key functions that Lucene provides; creating and index and executing a user’s query. Your application is responsible for setting up each of these, but these operation will be done separately.

主要，Lucene提供了两个关键功能：创建并建立索引并执行用户查询。您的应用程序负责设置每个功能，但是这些操作将分别进行。

Figure below shows you the first step you should pass through to ensure that your documents (Contents) are indexed.

下图显示了您应经过的第一步，以确保对文档（内容）进行索引。

While querying the index should be depicted by the below figure:

查询索引时，应如下图所示：

Sections below will help you getting further details about all of these components that you saw involved in the creating/querying index.

以下各节将帮助您获取有关创建/查询索引中涉及的所有这些组件的更多详细信息。

文件资料 (Documents)

Ideally, Lucene’s index consists of documents and the lucene document consists of one indexed object. This object could be a database record, web page, Java Object and etc.

理想情况下，Lucene的索引由文档组成，而Lucene文档由一个索引对象组成。该对象可以是数据库记录，网页，Java对象等。

Each document consists of set of fields and each field is a pair of name/value that represents a piece of content. A given samples on those fields might be title, summary, content, etc.

每个文档由一组字段组成，并且每个字段都是一对代表内容的名称/值。这些字段上的给定样本可能是标题，摘要，内容等。

To use a lucene’s document object you should have an object of type org.apache.lucene.document.Document class.

要使用Lucene的文档对象，您应该具有org.apache.lucene.document.Document类类型的对象。

分析仪 (Analyzer)

Analyzer is the pump heart of Lucene, you use Analyzer and its structural type in creating the Lucene index and inquiring it after then. Analyzer has the ability to turn free text into tokens that can be inquired later on.

Analyzer是Lucene的泵浦心脏，您可以使用Analyzer及其结构类型来创建Lucene索引并在之后查询它。 Analyzer可以将自由文本转换为令牌，稍后可以查询令牌。

Lucene has provided a lot of types of Analyzer as you can use the most fit one for your application. When you add a document to lucene’s index, Lucene will use the analyzer to process the text for every fields that are located at that document.

Lucene提供了许多类型的分析器，因为您可以为应用程序使用最合适的一种。当您将文档添加到Lucene的索引中时，Lucene将使用分析器来处理该文档中每个字段的文本。

You should be able of locating different types of Analyzers underneath org.apache.lucene.analysis package.

您应该能够在org.apache.lucene.analysis包下面找到不同类型的分析器。

询问 (Query)

Query object is the object that you used for inquiring the Lucene index. To create a Query object you may use different kinds of ways to achieve a Query against your index. You may return back into Lucene API to know more about :

查询对象是用于查询Lucene索引的对象。要创建查询对象，您可以使用多种方式对索引进行查询。您可以返回Lucene API进一步了解：

TermQuery
TermQuery
BooleanQuery
BooleanQuery
WildcardQuery
WildcardQuery
PhraseQuery
PhraseQuery
PrefixQuery
PrefixQuery
MultiPhraseQuery
MultiPhraseQuery
FuzzyQuery
FuzzyQuery
RegexpQuery
RegexpQuery
TermRangeQuery
TermRangeQuery
NumericRangeQuery
NumericRangeQuery
ConstantScoreQuery
ConstantScoreQuery
DisjunctionMaxQuery
DisjunctionMaxQuery
MatchAllDocsQuery
MatchAllDocsQuery

领域 (Field)

As we’ve stated earlier, a field is a pair of name/value that represents one piece of metadata or content for a Lucene document. Each field may be indexed, stored and/or tokenized. Indexed fields are searchable in Lucene, and Lucene will process them when the indexer adds the document to the index.

如前所述，字段是一对名称/值，代表Lucene文档的一个元数据或内容。每个字段可以被索引，存储和/或标记化。可以在Lucene中搜索索引字段，当索引器将文档添加到索引时，Lucene将对其进行处理。

Processing of document’s fields into sets of individual tokens is the job of Lucene Analyzer. A field object exist at the package org.apache.lucene.document.

Lucene Analyzer的工作是将文档的字段处理成单独的标记集。字段对象存在于org.apache.lucene.document包中。

TopScoreDocCollector (TopScoreDocCollector)

A collector implementation that collects the top-scoring hits, returning them as a TopDocs. This is used by IndexSearcher to implement TopDocs-based search. Hits are sorted by score descending and then (when the scores are tied) docID ascending.

一个收集器实现，它收集得分最高的匹配，并将其作为TopDocs返回。 IndexSearcher使用它来实现基于TopDocs的搜索。命中按得分降序排序，然后（当得分并列时）docID升序。

IndexSearcher (IndexSearcher)

You may notice below at the proposed sample that we used IndexSearcher that’s located at org.apache.lucene.search.Index.IndexSearcher package to make a search against out index and using the provided Query.

您可能会在下面的建议示例中注意到，我们使用了位于org.apache.lucene.search.Index.IndexSearcher包中的IndexSearcher来对索引进行搜索，并使用提供的Query。

Mainly, to get an IndexSearcher object you need to pass IndexReader as an argument to its constructor. As soon as you’ve invoked search against your IndexSearcher, the Collector object has propagated with the search result so that you can invoke topDocs().scoreDocs to acquire the hits object that is mainly contained for all of searched documents.

主要是，要获取IndexSearcher对象，您需要将IndexReader作为参数传递给其构造函数。对IndexSearcher调用搜索后，Collector对象随搜索结果一起传播，因此您可以调用topDocs().scoreDocs来获取主要包含在所有搜索文档中的hits对象。

命中 (Hits)

The search method on the IndexSearcher class returns an org.apache.lucene.search.Hits object which mainly contains the searched documents so that you can access, process and display all of them in whatever the form you want.

IndexSearcher类上的搜索方法返回一个org.apache.lucene.search.Hits对象，该对象主要包含搜索到的文档，以便您可以按所需的格式访问，处理和显示所有文档。

Hits object isn’t just simple Collection object, as much bigger as the result can be, the importance of Hits methods are become so critical and surely helpful. Hits object has mainly provided you a three methods that can be used for several reasons:

Hits对象不仅是简单的Collection对象，而且结果可能更大，因此Hits方法的重要性变得至关重要，并且肯定会有所帮助。 Hits对象主要为您提供了三种可以使用的方法，原因有以下几种：

public final Document doc(int n) throws IOException which mainly returns a Document that contains all of the document’s fields that were stored at the time the document as indexed.
public final Document doc(int n) throws IOException ，该public final Document doc(int n) throws IOException主要返回一个Document，其中包含在建立索引时存储的所有文档字段。
public final int length() which mainly returns the number of search results that matched the query.
public final int length() ，主要返回与查询匹配的搜索结果数。
public final float score(int n) throws IOException which mainly returns the calculated score for each hit in the search results.
public final float score(int n) throws IOException ，该public final float score(int n) throws IOException主要返回搜索结果中每次public final float score(int n) throws IOException的计算得分。

索引建立–索引器 (Index Building – Indexer)

Following sample below shows how you can leverage the Lucene API to index set of proposed JournalDev Tutorials. This index shall help you inquiring about any Tutorial that JournalDev site has provided.

下面的示例显示了如何利用Lucene API为提议的JournalDev教程集编制索引。该索引将帮助您查询JournalDev网站提供的任何教程。

This index will assume that you’re looking for Tutorials by their Title.

该索引将假设您正在按标题查找教程。

Indexer.java

package com.journaldev.portlet;

import java.io.File;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class Indexer {
	static {
			try {
				System.out.println("Initialize of Indexer ::");
				// Create an analyzer
				StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
				// Create a Lucene directory
				Directory dir = new SimpleFSDirectory(new File("D:\\LuceneSearch\\store"));
				System.out.println("Clean Index ::");
				for (String fileName : dir.listAll()){
					dir.deleteFile(fileName);
				}
				// Create index configuration writer
				IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
				// Create writer
				IndexWriter writer = new IndexWriter(dir, config);
				// Tutorial Topics
				String [] topics = {"Apache Pluto Tutorial","Hibernate Tutorial","Spring Tutorial","JSP & Servlet Tutorial","Primefaces Tutorial","LuceneSearch Tutorial"};

				for(String topic : topics){
					// Create document
					Document doc = new Document();
					// Add field
					doc.add(new TextField("title",topic,Field.Store.YES));
					// write document
					writer.addDocument(doc);
				}
				// Commit changes
				writer.commit();
				// Close the stream, so that you can open a read stream
				writer.close();
				System.out.println("All Tutorials Are Indexed ::");
			}
			catch(Exception e){
				e.printStackTrace();
			}
	}
}

Here’s below an additional clarification for the proposed code above:

以下是上述拟议代码的其他说明：

You have multiple types of Store locations, you can use RAMDirectory or something else you may find it eligible instead of using Physical location. This indexer above has used SimpleSFDirectory (Simple System File Directory) as a location for the index’s segments.
您有多种类型的存储位置，可以使用RAMDirectory或其他可能符合条件的位置，而不是使用物理位置。上面的索引器使用SimpleSFDirectory（简单系统文件目录）作为索引段的位置。
This indexer will be got executed as soon as the Indexer class has loaded by the ClassLoader. This kind of loading will absolutely trigger the static initializer to start indexing a proposed documents.
ClassLoader加载Indexer类后，将立即执行此indexer。这种加载将绝对触发静态初始化程序开始索引拟议的文档。
To prevent the index from indexing the same document multiple times at each time the Indexer got loaded, we provided a simple remove mechanism that help you clear the index directory.
为了防止索引在每次加载索引器时多次索引同一文档，我们提供了一种简单的删除机制，可帮助您清除索引目录。
We’ve used a simple Analyzer for generating the needed tokens.
我们使用了一个简单的分析器来生成所需的令牌。
We’ve supposed a different Topics that JournalDev site has provided through defining of String [] topics Tutorial array.
我们假设JournalDev网站通过定义String [] topics Tutorial数组提供了不同的Topics。
For every single Tutorial we defined a document has been created with one title field and indexed as well.
对于每一个单独的教程，我们都定义了一个带有一个title字段的文档，并对其进行了索引。
All changes on the index shall be committed.
索引上的所有更改均应提交。
Index write shall be closed so that another writer/reader can consume the created index.
索引写操作必须关闭，以便其他编写者/阅读者可以使用创建的索引。
In case you’ve missed closing your own writer once its work got finished, an exception will be thrown.
如果您错过了完成工作后关闭自己的编写器的权限，则会抛出异常。

简单的Lucene搜索Portlet (Simple Lucene Search Portlet)

Following below a simple Lucene Search Portlet that’s already built upon the same used index.

下面是一个简单的Lucene Search Portlet，它已经基于相同的使用索引构建。

Remember, you always use doView for rendering the view of the Portlet, meanwhile processAction has been used for initiating actions against your Portlet.

记住，您始终使用doView来呈现Portlet的视图，同时processAction已用于对Portlet发起操作。

LuceneSearch.java

package com.journaldev.portlet;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;

import javax.portlet.ActionRequest;
import javax.portlet.ActionResponse;
import javax.portlet.GenericPortlet;
import javax.portlet.PortletException;
import javax.portlet.RenderRequest;
import javax.portlet.RenderResponse;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.QueryBuilder;
import org.apache.lucene.util.Version;

public class LuceneSearch extends GenericPortlet{
	static {
		try {
			// Load the Indexer
			Class.forName("com.journaldev.portlet.Indexer");
		} catch (ClassNotFoundException e) {
			e.printStackTrace();
		}
	}

	private ScoreDoc [] hits = new ScoreDoc[0];
	private IndexSearcher searcher = null;

	public void doView(RenderRequest request, RenderResponse response) throws PortletException, IOException {
		synchronized(hits){
			// Get the writer
			PrintWriter out = response.getWriter();
			if(request.getParameter("status") == null || request.getParameter("status").equals("initial")){
				// Print out the form Tag
				out.print("<form method=\"GET\" action=\""+response.createActionURL()+"\">");
				// Print out the search input
				out.print("<p>Search about your favor Tutorial That JournalDev has presented : <input type=\"text\" "
						+ "name=\"query\" id=\"query\"/></p>");
				// Print out the search command
				out.print("<br/> "
						+ "<input type=\"submit\" value=\"Search\"/>");
				// close form
				out.print("</form>");
			}
			else {
				// Print out the form Tag
				out.print("<form method=\"GET\">");
				// Print out the result
				for(ScoreDoc hit : hits){
				    int docId = hit.doc;
				    Document d = searcher.doc(docId);
					out.print("<p>Tutorial Is <span style='font-style: oblique;font-weight: bolder;'>"+d.get("title")+"</span> <span style='color:red'>With Score :"+hit.score+"</span></p>");
				}
				// Print out the render link
				out.print("<br/>"
						+ "<a href=\""+response.createRenderURL()+"\"?status=initial>Search Again</a>");
				// Print out the form Tag
				out.print("</form>");
				// Check whether the searcher is not null to close it
				if(searcher != null){
					// Close the reader for future modifications on the indexer
					searcher.getIndexReader().close();
				}
			}
		}
	}

	public void processAction(ActionRequest request, ActionResponse response) throws PortletException, IOException {
		// Fetch the hits
		synchronized (hits){
			// Reset the hits object
			hits = new ScoreDoc[0];
			// Create an analyzer
			StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
			// Create a Lucene directory
			Directory dir = new SimpleFSDirectory(new File("D:\\LuceneSearch\\store"));
			// Open index reader
			IndexReader reader = IndexReader.open(dir);
			// Create index searcher
			searcher = new IndexSearcher(reader);

			// Inquiry using QueryBuidler
			Query query = new QueryBuilder(analyzer).createPhraseQuery("title", request.getParameter("query"));
			// Create collector
			TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
			// Search using defined query and fill in the resulted in document inside collector
			searcher.search(query, collector);
			// Acquire the hits
			this.hits = collector.topDocs().scoreDocs;
			response.setRenderParameter("status", "searched");
		}

	}

}

Here’s below a detailed clarification for the code listed above:

以下是上面列出的代码的详细说明：

According for best Portlet design, doView is will be used for displaying the search form and displaying the search result as well. At the same time, processAction will be used for handling the user’s query and to do the actual search work.
根据最佳Portlet设计， doView将用于显示搜索表单和显示搜索结果。同时， processAction将用于处理用户的查询并进行实际的搜索工作。
LuceneSearch Portlet will load the Indexer class, so that the index will be created for next coming search operations.
LuceneSearch Portlet将加载Indexer类，以便为接下来的搜索操作创建索引。
Two different instance variables have been defined and used; hits and searcher.
定义并使用了两个不同的实例变量； hits searcher 。
In case request’s parameter status is null or equal to initial, a search form will be provided for the end user to fill in his/her Tutorial title that he/she is looking for.
如果请求的参数status为null或等于初始值，则将提供搜索表格供最终用户填写他/她正在寻找的Tutorial标题。
In case request’s parameter status isn’t null or equal to initial, that means the user has clicked on the search action and the search results has been propagated and waiting to display.
如果请求的参数status不为null或不等于初始值，则意味着用户单击了搜索操作，并且搜索结果已经传播并等待显示。
To protect your application from multiple requests that can affect the result to be inconsistent, a synchronized block has been provided for both of doView & processAction.
为了保护您的应用程序免受可能影响结果不一致的多个请求的影响，已为doView和processAction提供了一个synchronized块。
Once the user has clicked on the search action, processAction method got executed and the search operation has been started.
用户单击搜索动作后，将执行processAction方法，并且已开始搜索操作。
Hits object will be propagated with the resulted in documents and status parameter changed to be searched.
将传播命中对象，并将结果文档和status参数更改为要搜索的结果。
IndexWriter and IndexReader are used for writing to and reading from, respectively.
IndexWriter和IndexReader分别用于写入和读取。
doView method starts its work once the processAction got finished.
一旦processAction完成，doView方法即开始工作。
The search result will be displayed and the IndexReader will be closed. This close will help you avoiding any lock your read operation may cause. If you’re trying to write on your index while the reading process is already running an exception will be thrown and vice versa is true.
搜索结果将显示，并且IndexReader将关闭。此关闭操作将帮助您避免读取操作可能导致的任何锁定。如果在读取过程已在运行时尝试在索引上进行写操作，则会引发异常，反之亦然。
The results will be displayed attached with their scores.
结果将与分数一起显示。

简单的Lucene搜索Portlet演示 (Simple Lucene Search Portlet Demo)

Following below the normal flow that you may face if you’re deploying the Portlet into your Apache Pluto. This Tutorial assumes that you’re already familiar with the Apache Pluto and know exactly how you can create a Portal Page and deploying your Portlet within it.

如果要将Portlet部署到Apache Pluto中，请遵循以下正常流程。本教程假定您已经熟悉Apache Pluto，并且确切地知道如何创建门户页面并在其中部署Portlet。

In case you’ve missed out this important practice, it’s better for you to return back into Introduction Into Apache Pluto.

如果您错过了这一重要实践，最好返回“ Introduction To To Apache Pluto” 。

摘要 (Summary)

Search functionality is a key aspect that most recent sites provide it. Most applications these days don’t rely on a single location to retain its data, it’s most probably tend to search against database records, HTML pages, word document and many others. Best solution for this issue is having a single search engine that can do its work against all of these types of data in uniform interface.

搜索功能是最新站点提供的关键方面。如今，大多数应用程序不依赖单个位置来保留其数据，它很可能倾向于针对数据库记录，HTML页面，Word文档和许多其他文件进行搜索。解决此问题的最佳解决方案是拥有一个搜索引擎，该引擎可以在统一界面中针对所有这些类型的数据进行工作。

This tutorial will help you getting started leveraging Lucene Search Engine and enabling you to create a Search Portlet. Contribute us by commenting below and find below this downloadable source code for your experimental.

本教程将帮助您开始使用Lucene Search Engine，并使您能够创建Search Portlet。在下方评论，为我们提供帮助，并在下面找到该可下载的源代码供您进行实验。

Download Apache Pluto Lucene Search Integration Project 下载Apache Pluto Lucene搜索集成项目

翻译自: https://www.journaldev.com/5105/integration-apache-pluto-with-lucene-search-engine-example-tutorial

adalm pluto

cunchi4221

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
adalm pluto_将Apache Pluto与Lucene搜索引擎示例教程集成

adalm plutoKnowledge information retrieval isn’t a luxury requirement that your application may or may not provide. Best applications are those that are providing cross-site search, so it’s minimize t...
复制链接

扫一扫