最近要使用lucene搭建全文中文检索,使用IKAnalyzer中文分词,但是这两个库的最新版本都是使用Java实现的,
如果C#要使用最新版的lucene,需要修改比较多的地方,比较难实现。索性改为使用Thrift的RPC方式实现调用,即
使用Java作为服务端,C#作为客户端,经过测试,效率也很高,下面是实现主要步聚。
1. 必备库和辅助工具:
1.1 必备库:jdk1.8,thrift-0.13.0,lucene-8.5.2,IKAnalyzer2017_6_6_0,.net-4.0,apache-maven-3.6.3
lucene和IKAnalyzer下载源码打包成jar包,thrift有两部分,一个thrift-0.13.0.exe程序用于生成代码文件,一个是
thrift实现代码,也打包jar包供调用。
1.2 service辅助工具:JavaService.exe
2. Java服务端实现
2.1 定义thrift服务文件,这里准备定义一个搜索索引类FuzzyIndex,实现类SearchEngine。
定义文件SearchEngine.thrift,内容如下:
struct FuzzyIndex
{
1:string KeyWord = "";
2:string KeyCode = "";
3:i16 IndexType = 0;
}
service SearchEngine {
i32 CreateFuzzyIndex(1:list<FuzzyIndex> dataList)
list<FuzzyIndex> QueryFuzzyIndex(1:string keyWord)
}
运行命令生成Java类文件:
thrift-0.13.0 -gen java SearchEngine.thrift
生成目录gen-java下的类文件:FuzzyIndex.java,SearchEngine.java
运行命令生成C#类文件:
thrift-0.13.0 -gen csharp SearchEngine.thrift
生成目录gen-charp下的类文件:FuzzyIndex.cs,SearchEngine.cs
这里可能有警告信息:
[WARNING:generation:1] The 'csharp' target is deprecated. Consider using 'netstd' instead.
不用管它,不影响。
2.2 新建实现类SearchClient,建两个方法,用于创建和搜索:
package luceneTest;
import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import javax.management.Query;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.TotalHits;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.wltea.analyzer.lucene.IKAnalyzer;
import com.sun.javafx.collections.MappingChange.Map;
public class SearchClient {
private Object mutexLock = new Object();
public void CreateFuzzyIndex(List<FuzzyIndex> dataList) throws IOException {
if (dataList == null || dataList.size() < 1) {
return;
}
synchronized (mutexLock) {
// 1. 指定索引文件存储位置
Directory directory = FSDirectory.open(Paths.get("D:/IndexData"));
// 2. 创建分词器 标准分词器
Analyzer analyzer = new IKAnalyzer();
// 3. 创建索引写入器
IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setOpenMode(OpenMode.CREATE_OR_APPEND); // 索引不存在创建,索引存在追加
IndexWriter indexWriter = new IndexWriter(directory, config);
// 4. 创建索引文档
Document document = new Document();
document.add(new Field("id", "2", StringField.TYPE_STORED));
document.add(new Field("name", "CoreJava实战", StringField.TYPE_STORED));
document.add(new Field("content", "百知金牌讲师 胡鑫哲出品", TextField.TYPE_STORED));
// 5. 添加索引
for (FuzzyIndex item : dataList) {
document = new Document();
document.add(new Field("KeyWord", item.KeyWord, TextField.TYPE_STORED));
document.add(new Field("KeyCode", item.KeyCode, StringField.TYPE_STORED));
document.add(new Field("IndexType", Integer.toString(item.IndexType), TextField.TYPE_STORED));
indexWriter.addDocument(document);
}
// 6. 释放资源
indexWriter.commit();
indexWriter.close();
directory.close();
}
}
public List<FuzzyIndex> SearchIndex(String keyWord) throws IOException, ParseException {
List<FuzzyIndex> dataList = new ArrayList<FuzzyIndex>();
synchronized (mutexLock) {
IndexReader ireader = DirectoryReader.open(FSDirectory.open(Paths.get("D:/IndexData")));
IndexSearcher isearcher = new IndexSearcher(ireader);
IKAnalyzer analyzer = new IKAnalyzer();
//MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[] { "KeyWord", "KeyCode", "IndexType" },
// analyzer);
QueryParser parser = new QueryParser("KeyWord",analyzer);
String keywords = QueryParser.escape(keyWord);
org.apache.lucene.search.Query query = parser.parse(keywords);
//TermQuery query = new TermQuery(new Term("KeyWord",keywords));
// search one result
TopDocs results = isearcher.search(query, 2000);
ScoreDoc[] hits = results.scoreDocs;
TotalHits numTotalHits = results.totalHits;
System.out.println("total matching documents :" + numTotalHits);
for (int i = 0; i < hits.length; i++) {
Document doc = isearcher.doc(hits[i].doc);
FuzzyIndex data = new FuzzyIndex();
data.KeyWord = doc.get("KeyWord");
data.KeyCode = doc.get("KeyCode");
data.IndexType = (short) Integer.parseInt(doc.get("IndexType"));
dataList.add(data);
}
ireader.close();
}
return dataList;
}
}
2.3 新建实现类SearchEngineImpl,即RPC对外暴露的接口类:
package luceneTest;
import org.apache.thrift.TException;
import java.util.*;
import java.lang.*;
public class SearchEngineImpl implements SearchEngine.Iface {
private SearchClient mySearch = new SearchClient();
public SearchEngineImpl() {
}
public int CreateFuzzyIndex(List<FuzzyIndex> dataList) throws org.apache.thrift.TException {
try {
mySearch.CreateFuzzyIndex(dataList);
return 0;
} catch (Exception ex) {
return -1;
}
}
public List<FuzzyIndex> QueryFuzzyIndex(String keyWord) throws org.apache.thrift.TException {
try {
return mySearch.SearchIndex(keyWord);
} catch (Exception ex) {
return null;
}
}
}
2.4 实现Java服务类:
package luceneTest;
import java.io.IOException;
import java.util.List;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.thrift.TException;
import org.apache.thrift.TProcessor;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.server.TServer;
import org.apache.thrift.server.TThreadPoolServer;
import org.apache.thrift.transport.TServerSocket;
import org.apache.thrift.transport.TTransportException;
public class searchTest {
private static Thread thread = null;
private static Service service = null;
public static void StartService(String[] args) {
// 产生服务线程
service = new Service();
thread = new Thread(service);
try {
// 将服务线程设定为用户线程,以避免StartService方法结束后线程退出
thread.setDaemon(false);
if (!thread.isDaemon()) {
}
// 启动服务线程
thread.start();
} catch (SecurityException se) {
}
}
public static void StopService(String[] args) {
service.setRunFlag(false);
}
public static void main(String[] args) throws IOException, ParseException, TException {
new Service().run();
//SearchEngineImpl obj = new SearchEngineImpl();
//List<FuzzyIndex> datalist = obj.QueryFuzzyIndex("锦绣新村");
}
}
class Service implements Runnable {
private boolean runFlag = true;
/**
* 设定服务线程运行标志值
*
* @param runFlag
*/
public synchronized void setRunFlag(boolean runFlag) {
this.runFlag = runFlag;
}
/**
* 取得服务线程运行标志值
*
* @param void
*/
@SuppressWarnings("unused")
private synchronized boolean getRunFlag() {
return runFlag;
}
@Override
public void run() {
try {
// 设置服务端口为 8090
TServerSocket serverTransport = new TServerSocket(8090);
// 设置协议工厂为 TBinaryProtocol.Factory
TBinaryProtocol.Factory proFactory = new TBinaryProtocol.Factory();
// 关联处理器与 SearchEngine 服务的实现
TProcessor processor = new SearchEngine.Processor<SearchEngine.Iface>(new SearchEngineImpl());
TThreadPoolServer.Args args1 = new TThreadPoolServer.Args(serverTransport);
args1.processor(processor);
args1.protocolFactory(proFactory);
TServer server = new TThreadPoolServer(args1);
System.out.println("Start server on port 8090...");
server.serve();
} catch (TTransportException e) {
e.printStackTrace();
}
}
}
3. C#客户端调用:
3.1 新建实现类:SearchEngineImpl,方法体实现写为空即可。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace client
{
public class SearchEngineImpl : SearchEngine.Iface
{
public int CreateFuzzyIndex(List<FuzzyIndex> dataList)
{
return 0;
}
public List<FuzzyIndex> QueryFuzzyIndex(string keyWord)
{
return null;
}
}
}
3.2 调用类ServiceClient:
public class ServiceClient
{
public const string SERVERIP = "127.0.0.1";
public static int SERVERPORT = 8090;
public static int TIMEOUT = 3000;
private TTransport transport = null;
private SearchEngine.Client client = null;
public HelloWorldServiceClient()
{
transport = new TSocket(SERVERIP, SERVERPORT, TIMEOUT);
//协议要和服务端一致
TProtocol protocol = new TBinaryProtocol(transport);
client = new SearchEngine.Client(protocol);
transport.Open();
}
public int CreateFuzzyIndex(List<FuzzyIndex> dataList)
{
return client.CreateFuzzyIndex(dataList);
}
public List<FuzzyIndex> QueryFuzzyIndex(string keyWord)
{
return client.QueryFuzzyIndex(keyWord);
}
}
3.3 测试调用
创建索引:
List<FuzzyIndex> dataList = new List<FuzzyIndex>();
FuzzyIndex data = new FuzzyIndex();
data.KeyWord = "万达广场";
data.KeyCode = "1000000000";
data.IndexType = 3;
dataList.Add(data);
int retv = client.CreateFuzzyIndex(dataList);
if (retv == 0)
{
MessageBox.Show("OK");
}
查询索引:
List<FuzzyIndex> dataList = client.QueryFuzzyIndex("万达");
4. 发布安装成Windows服务,这里我命名服务名为:SearchService
JavaService.exe -install SearchService "%JAVA_HOME%/jre/bin/server/jvm.dll" -Djava.ext.dirs="%JAVA_HOME%/jre/lib/ext" -Xms128m -Xmx512m -Djava.class.path="JAVA_HOME%/lib/tools.jar;c:/SearchService/ec.jar" -start searchTest.SearchService -out "%CD%/out.log" -err "%CD%/err.log" -current "%CD%" -auto
JavaService.exe的具体命令参数可以参看其帮助。
启动服务,客户端就可以调用了。