学习LuceneSail（论文+代码举例）

最新推荐文章于 2019-09-27 22:06:23 发布

置顶葛琪琪

最新推荐文章于 2019-09-27 22:06:23 发布

阅读量485

点赞数 1

分类专栏：论文 RDF关键字查询 LuceneSail 文章标签： luceneSail RDF SPARQL

本文链接：https://blog.csdn.net/u010707315/article/details/80752643

版权

论文同时被 3 个专栏收录

4 篇文章 0 订阅

订阅专栏

RDF关键字查询

3 篇文章 0 订阅

订阅专栏

LuceneSail

1 篇文章 0 订阅

订阅专栏

一. 论文

1，论文出处

The Sesame LuceneSail: RDF queries with full-text search (2008年发表)

2，论文笔记

简单说，LuceneSail就是集成lucene到sesame。它是结构化RDF查询和全文搜索的一个结合，sesame (RDF store)和lucene(文本搜索库)。

Motivation : SPARQL只能字符串全匹配或过滤，操作慢。

luceneSail 文档：此处

下面分别简单介绍sesame和lucene.

sesame

sesame是一个RDF store，可以存储RDF文件并且对RDF进行查询等操作。它可以使用不同的后端，比如关系数据库、或者原生RDF文件。像JAVA的JDBC连接一样，sesame也需要连接，sesame连接使用SALL Connection(SALL对象)。通过连接，可以进行添加（adding）、移除(removing)、查询(query)、事务管理等操作。

连接集中：事务的清除处理和对RDF store的并发访问。

Lucene

lucene是一个全文搜索引擎库，每个字段包含字符串名称和字符串的值。每一个被检索的文档提供一个评分（根据TFIDF测量的相关性）。查询时，必须指定要查询的字段，故不可能一次只查询所有字段。因此，最好把所有文本再次存储在一个索引的字段中，就可以在所有字段上快速查询。

图1 Lucene实现全文检索的流程

　　1、绿色表示索引过程，对要搜索的原始内容进行索引构建一个索引库，索引过程包括：

　　　　确定原始内容即要搜索的内容→采集文档→创建文档→分析文档→索引文档

　　2、红色表示搜索过程，从索引库中搜索内容，搜索过程包括：

　　　　用户通过搜索界面→创建查询→执行搜索，从索引库搜索→渲染搜索结果

二.代码

详细代码已上传至github: https://github.com/kathy775/LuceneSail-Demo

实验环境：

java1.7

需要sesame的jar包

代码：

基于nativeStore生成索引

import java.io.File;
import java.util.Scanner;
import org.openrdf.query.Binding;
import org.openrdf.query.BindingSet;
import org.openrdf.query.QueryLanguage;
import org.openrdf.query.TupleQuery;
import org.openrdf.query.TupleQueryResult;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.repository.sail.SailRepositoryConnection;
import org.openrdf.rio.RDFFormat;
import org.openrdf.sail.lucene.LuceneSail;
import org.openrdf.sail.lucene.LuceneSailSchema;
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.sail.nativerdf.NativeStore;
/**
 * Example code showing how to use the LuceneSail
 */
public class DBPLuceneSailIndex {
	public static void main(String[] args) throws Exception {		 
		 String index_path = "/home/LuceneSailMemo/dbpedia";	 
		 createSimple(index_path);	
	}

	/**
	 * Create a LuceneSail and add some triples to it, ask a query.
	 */
	public static void createSimple(String index_path ) throws Exception {
		// create a sesame memory sail
		NativeStore myStore = new NativeStore();
		File dataDir = new File(index_path);
		myStore.setDataDir(dataDir);
		// create a lucenesail to wrap the memorystore
		LuceneSail lucenesail = new LuceneSail();
		// set this parameter to let the lucene index store its data in ram
		lucenesail.setParameter(LuceneSail.LUCENE_DIR_KEY, "true");
		// set this parameter to store the lucene index on disk
		// lucenesail.setParameter(LuceneSail.LUCENE_DIR_KEY, "./data/mydirectory");

		// wrap memorystore in a lucenesail
		lucenesail.setBaseSail(myStore);
		// create a Repository to access the sails
		SailRepository repository = new SailRepository(lucenesail);
		repository.initialize();
		SailRepositoryConnection connection = repository.getConnection();
		// connection.begin();
		try {
			// connection.setAutoCommit(false);
			// System.out.println(System.getProperty("user.dir"));  

			String file_path = "file path";
                        File file = new File(file_path);
			System.out.println(file.exists());
                        connection.add(file, "", RDFFormat.NTRIPLES);	

			connection.commit();
			System.out.println("------ 指数文件已生成 -----");
		} finally {
			connection.close();
			repository.shutDown();
		}
	}
}

再回首，在更新……

下次：luceneSail查询部分代码。memoryStore代码。细节补充

2018年12月14日更新 ……

代码已上传至github: https://github.com/kathy775/LuceneSail-Demo

包括：

基于nativeStore生成索引

基于memoryStore生成索引

使用 LuceneSail 查询关键字

下次：查询的细节补充